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Preface 


The target audience of this book are computer scientists, system architects, 
knowledge engineers and programmers, who face a problem of combining var- 
ious inputs into a single output. Our intent is to provide these people with an 
easy-to-use guide about possible ways of aggregating input values given on a 
numerical scale, and ways of choosing/constructing aggregation functions for 
their specific applications. 

A prototypical user of this guide is a software engineer who works on 
building an expert or decision support system, and is interested in how to 
combine information coming from different sources into a single numerical 
value, which will be used to rank the alternative decisions. The complexity of 
building such a system is so high, that one cannot undertake a detailed study 
of the relevant mathematical literature, and is rather interested in a simple 
off-the-shelf solution to one of the many problems in this work. 

We present the material in such a way that its understanding does not re- 
quire specific mathematical background. All the relevant notions are explained 
in the book (in the introduction or as footnotes), and we avoid referring to 
advanced topics (such as algebraic structures) or using pathological examples 
(such as discontinuous functions). While mathematically speaking these top- 
ics are important, they are well explained in a number of other publications 
(some are listed at the end of the introduction). Our focus is on practical 
applications, and our aims are conciseness, relevance and quick applicability. 

We treat aggregation functions which map several inputs from the inter- 
val [0,1] to a single output in the same interval. By no means this is the 
only possible framework for aggregating the inputs or performing information 
fusion — in many cases the inputs are in fact discrete or binary. However it 
is often possible and useful to map them into the unit interval, for example 
using degrees of membership in fuzzy sets. As we shall see, even in this sim- 
plified framework, the theory of aggregation is very rich, so choosing the right 
operation is still a challenge. 

As we mentioned, we present only the most important mathematical prop- 
erties, which can be easily interpreted by the practitioners. Thus effectively 
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this book is an introduction to the subject. Yet we try to cover a very broad 
range of aggregation functions, and present some state-of-the-art techniques, 
typically at the end of each section. 

Chapter [I] gives a broad introduction to the topic of aggregation func- 
tions. It covers important general properties and lists the most important 
prototypical examples: means, ordered weighted averaging (OWA) functions, 
Choquet integrals, triangular norms and conorms, uninorms and nullnorms. 
It addresses the problem of choosing the right aggregation function, and also 
introduces a number of basic numerical tools: methods of interpolation and 
smoothing, linear and nonlinear optimization, which will be used to construct 
aggregation functions from empirical data. 

Chapters P]-Mlgive a detailed discussion of the four broad classes of aggre- 
gation functions: averaging functions (Chapter), conjunctive and disjunctive 
functions (Chapter 3) and mixed functions (Chapter Ø). Each class has many 
distinct families, and each family is treated in a separate section. We give a 
formal definition, discuss important properties and their interpretation, and 
also present specific methods for fitting a particular family to empirically col- 
lected data. We also provide examples of computer code (in C++ language) 
for calculating the value of an aggregation function, various generalizations, 
advanced constructions and pointers to specific literature. 

In Chapter 5] we discuss the general problem of fitting chosen aggregation 
functions to empirical data. We formulate a number of mathematical program- 
ming problems, whose solution provides the best aggregation function from a 
given class which fits the data. We also discuss how to evaluate suitability of 
such functions and measure consistency with the data. 

In Chapter [6]we present a new type of interpolatory aggregation functions. 
These functions are constructed based on empirical data and some general 
mathematical properties, by using interpolation or approximation processes. 
The aggregation functions are general (i.e., they typically do not belong to 
any specific family), and are not expressed via an algebraic formula but rather 
a computational algorithm. While they may lack certain interpretability, they 
are much more flexible in modeling the desired behavior of a system, and 
numerically as efficient as an algebraic formula. These aggregation functions 
are suitable for computer applications (e.g., expert systems) where one can 
easily specify input-output pairs and a few generic properties (e.g., symmetry, 
disjunctive behavior) and let the algorithm build the aggregation functions 
automatically. 

The final Chapter [Z] outlines a few classes of aggregation functions not 
covered elsewhere in this book, and presents various additional properties 
that may be useful for specific applications. It also provides pointers to the 
literature where these issues are discussed in detail. 

Appendix[AJoutlines some of the methods of numerical approximation and 
optimization that are used in the construction of aggregation functions, and 
provides references to their implementation. Appendix [B] contains a number 
of problems that can be given to undergraduate and graduate students. 


Preface IX 


This book comes with a software package AOTool, which can be freely 
downloaded from http: //www.deakin.edu.au/~gleb/aotool .html. AOTool 
implements a large number of methods for fitting aggregation functions (ei- 
ther general or from a specific class) to empirical data. These methods are 
described in the relevant sections of the book. AOTool allows the user to load 
empirical data (in spreadsheet format), to calculate the parameters of the best 
aggregation function which fits these data, and save these parameters for fu- 
ture use. It also allows the user to visualize some two-dimensional aggregation 
functions. 

We reiterate that this book is oriented towards practitioners. While basic 
understanding of aggregation functions and their properties is required for 
their successful usage, the examples of computer code and the software pack- 
age for building these functions from data allow the reader to implement most 
aggregation functions in no time. It takes the complexity of implementation 
off the users, and allows them to concentrate on building their specific system. 


Melbourne, Móstoles, Alcala de Henares Gleb Beliakov 
May 2007 Ana Pradera 
Tomasa Calvo 


Notations used in this book 


* eyes 


<x,y> 


the set of real numbers; 
the set {1,2,...,n}; 
the power set (i.e., the set of all subsets of the set X); 
the complement of the set A; 
n-dimensional real vector, usually from [0, 1)”; 
scalar (or dot) product of vectors x and y; 
permutation of the vector x which arranges its components in 
non-increasing order; 
permutation of the vector x which arranges its components in 
non-decreasing order; 
a function of n variables, usually fn : [0,1]" — [0,1]; 
an extended function F: U  [0,1])” — [0,1]; 
n€{1,2,...} 
the composition of functions f and g; 
the inverse of the function g; 
the pseudo-inverse of the function g; 
a strong negation function; 
a fuzzy measure; 
a weighting vector; 
the natural logarithm; 
Averaging functions 
arithmetic mean of x; 
weighted arithmetic mean of x with the weighting vector w; 
weighted power mean of x with the weighting vector w and 
exponent r; 
weighted quasi-arithmetic mean of x with the weighting vector 
w and generator g; 
median of x; 
a-median of x; 
quadratic mean of x; 
weighted quadratic mean of x with the weighting vector w; 
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Tp, Tz, Tp 
Sp, SL, SD 


C 


U 
V 
UT,5S,e 


VT,S,a 
Ez T,S 
Ly T8 
Qy,T,8,9 
Os.w 


OT,w 
Os Tw 


harmonic mean of x; 

weighted harmonic mean of x with the weighting vector w; 
geometric mean of x; 

weighted geometric mean of x with the weighting vector w; 
ordered weighted average of x with the weighting vector w; 
quadratic ordered weighted average of x with the weighting 
vector w; 

geometric ordered weighted average of x with the weighting 
vector W; 

discrete Choquet integral of x with respect to the fuzzy measure 
v; 

discrete Sugeno integral of x with respect to the fuzzy measure 
v; 

Conjunctive and disjunctive functions 

triangular norm; 

triangular conorm; 

the basic triangular norms (product, Łukasiewicz and drastic 
product); 

the basic triangular conorms (dual product, Łukasiewicz and 
drastic sum); 

copula; 

Mixed functions 

uninorm; 

nullnorm; 

uninorm with underlying t-norm T, t-conorm S and neutral 
element e; 

nullnorm with underlying t-norm T, t-conorm S$ and absorbent 
element a; 

exponential convex T-S function with parameter y and t-norm 
and t-conorm T and S; 

linear convex T-S function with parameter y and t-norm and 
t-conorm T and S; 

T-S function with parameter y, t-norm and t-conorm T and S 
and generator g; 

S-OWA function with the weighting vector w and t-conorm S; 
T-OWA function with the weighting vector w and t-norm T; 
ST-OWA function with the weighting vector w, t-conorm S$ and 
t-norm T. 
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Acronyms and Abbreviations 


LAD least absolute deviation 
LP linear programming 

LS least squares 

OWA ordered weighted averaging 
QP quadratic programming 
t-norm triangular norm 

t-conorm triangular conorm 

s.t. subject to 

w.r.t. with respect to 


WOWA weighted ordered weighted averaging 
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Introduction 


1.1 What is an aggregation function? 


A mathematical function is a rule, which takes an input — a value called 
argument, and produces an output — another value. Each input has a unique 
output associated to it. A function is typically denoted by y = f(a), where x 
is the argument and y is the value. The argument x can be a vector, i.e., a 
tuple of size n: x = (£1, %2,...,Up), and z1, £2, ..., En are called components 
of x. 

It is important to understand that f can be represented in several ways: 


a) as an algebraic formula (say, f(x) = £1 + £2 — %3), 

b) as a graph of the function (e.g., as 2D, 3D plot, or contour plot), 

c) verbally, as a sequence of steps (e.g., take the average of components of 
x), or more formally, as an algorithm, 

d) as a lookup table, 

e) as a solution to some equation (algebraic, differential, or functional), 

f) as a computer subroutine that returns a value y for any specified x (so 
called oracle). 


Some representations are more suitable for visual or mathematical anal- 
ysis, whereas for the use in a computer program all representations (except 
graphical) are equivalent, as they are all converted eventually to representa- 
tion f). Some people mistakenly think of functions only as algebraic formulas. 
We will not distinguish between functions based on their representation, f 
can be given in any mentioned way. What is important though is that f 
consistently returns the same and unique value for any given x. 

Aggregation functions are functions with special properties. In this book we 
only consider aggregation functions that take real arguments from the closed 
interval [0,1] and produce a real value in [0, yE This is usually denoted as 
f : [0,1]” — [0,1] for functions that take arguments with n components. 


1 The interval [0, 1] can be substituted with any interval [a,b] using a simple trans- 
formation, see Section 


2 1 Introduction 


The purpose of aggregation functions (they are also called aggregation 
operators |} , both terms are used interchangeably in the existing literature) 
is to combine inputs that are typically interpreted as degrees of membership 
in fuzzy sets, degrees of preference, strength of evidence, or support of a 
hypothesis, and so on. Consider these prototypical examples. 


Example 1.1 (A multicriteria decision making problem). There are two (or 
more) alternatives, and n criteria to evaluate each alternative (or rather a pref- 
erence for each alternative). Denote the scores (preferences) by £1, 22,...,2n 
and ¥1, y2,---,Yn for the alternatives x and y respectively. The goal is to com- 
bine these scores using some aggregation function f, and to compare the values 
f(x1,%2,...,%n) and f(y1, y2,---,;Yn) to decide on the winning alternative. 


Example 1.2 (Connectives in fuzzy logic). An object x has partial degrees of 
membership in n fuzzy sets, denoted by 41, W2,..-, fn. The goal is to obtain 
the overall membership value in the combined fuzzy set u = f (1, H2,- -, Hn). 
The combination can be the set operation of union, intersection, or a more 
complicated (e.g., composite) operation. 


Example 1.3 (A group decision making problem). There are two (or more) 
alternatives, and n decision makers or experts who express their evaluation 
of each alternative as 71, £2,..., £n. The goal is to combine these evaluations 
using some aggregation function f, to obtain a global score f(21,22,..., 2n) 
for each alternative. 


Example 1.4 (A rule based system). The system contains rules of the form 
If ty 1s Aj AND to 1s Apo AND ... lri 1s An THEN ... 


1,%2,-..,XLn denote the degrees of satisfaction of the rule predicates tı is Aj, 
t2 is Ag, etc. The goal is to calculate the overall degree of satisfaction of the 
combined predicate of the rule antecedent f(x1,22,...,2n). 


The input value 0 is interpreted as no membership, no preference, no 
evidence, no satisfaction, etc., and naturally, an aggregation of n 0s should 
yield 0. Similarly, the value 1 is interpreted as full membership (strongest 
preference, evidence), and an aggregation of 1s should naturally yield 1. This 
implies a fundamental property of aggregation functions, the preservation of 
the bounds 


2 In Mathematics, the term operator is reserved for functions f : X — Y, whose 
domain X and co-domain Y consist of more complicated objects than sets of 
real numbers. Typically both X and Y are sets of functions. Differentiation and 
integration operators are typical examples, see, e.g., P56. Therefore, we shall use 
the term aggregation function throughout this book. 

3 As a specific example, consider the rules in a fuzzy controller of an air conditioner: 
If temperature is HIGH AND humidity is MEDIUM THEN... 
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f(0,0,...,0)=0 and f(1,1,...,1) =1. (1.1) 
— —— 
n—times n—times 


The second fundamental property is the monotonicity condition. Consider 
aggregation of two inputs x and y, such that xı < yı and x; = y; for all 
j = 2,...,n, eg., x = (a,b,b,b),y = (c,b,b,b),a < c. Think of the j-th 
argument of f as the degree of preference with respect to the j-th criterion, 
and x and y as vectors representing two alternatives A and B. Thus B is 
preferred to A with respect to the first criterion, and we equally prefer the 
two alternatives with respect to all other criteria. Then it is not reasonable 
to prefer A to B. Of course the numbering of the criteria is not important, so 
monotonicity holds not only for the first but for any argument zi. 

For example, consider buying an item in a grocery store. There are two 
grocery shops close by (the two alternatives are whether to buy the item in 
one or the other shop), and the item costs less in shop A. The two criteria 
are the price and distance to the shop. We equally prefer the two alternatives 
with respect to the second criterion, and prefer shop A with respect to the 
price. After combining the two criteria, we prefer buying in shop A and not 
shop B. 

The same reasoning applies in other interpretations of aggregation func- 
tions and their arguments, e.g., combination of membership values in fuzzy 
sets. Given that two objects have the same membership value in all but one of 
the fuzzy sets, and object A has greater membership in the remaining fuzzy 
set than object B, the overall membership of A in the combined fuzzy set is 
no smaller than that of B. Mathematically, (non-decreasing) monotonicity in 
all arguments is expressed as 


zi < yi for alli € {1,...,n} implies f(r1,...,an) < f(yi,---;yn). (1.2) 


We will frequently use vector inequality x < y, which means that each com- 
ponent of x is no greater than the corresponding component of y. Thus, 
non-decreasing monotonicity can be expressed as x < y implies f(x) < f(y). 
Condition (1.2) is equivalent to the condition that each univariate function 
fx(t) = f(x) with t = x, and the rest of the components of x being fixed, is 
monotone non-decreasing in t. 

The monotonicity in all arguments and preservation of the bounds are the 
two fundamental properties that characterize general aggregation functions. If 
any of these properties fails, we cannot consider function f as an aggregation 
function, because it will provide inconsistent output when used, say, in a 
decision support system. All the other properties discussed in this book define 
specific classes of aggregation functions. We reiterate that an aggregation 
function can be given in any form a)-f) on pi 
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Definition 1.5 (Aggregation function). An aggregation function is a 
function of n > 1 arguments that maps the (n-dimensional) unit cube onto 
the unit interval f : [0,1]" — [0,1], with the properties 


(i) f(0,0,...,0)=0 and f(1,1,...,1) =1. 
—— —_—’ 


n—times n—times 


(ii) x< y implies f(x) < f(y) for all x,y € [0,1]”. 


It is often the case that aggregation of inputs of various sizes has to be 
considered in the same framework. In some applications, input vectors may 
have a varying number of components (for instance, some values can be miss- 
ing). In theoretical studies, it is also often appropriate to consider a family of 
functions of n = 2,3,... arguments with the same underlying property. The 
following mathematical construction of an extended aggregation function 
allows one to define and work with such families of functions of any number 
of arguments. 


Definition 1.6 (Extended aggregation function). An extended aggrega- 
tion function is a mapping 


F: |J -> (0,1), 
nE{1,2,...} 


such that the restriction of this mapping to the domain |0, 1]” for a fixed n is 
an n-ary aggregation function f, with the convention F(x) =a forn=1. 


Thus, in simpler terms, an extended aggregation function [{] is a family of 
2—,3—,... variate aggregation functions, with the convention F(x) = x for the 
special case n = 1. We shall use the notation fn when we want to emphasize 
that an aggregation function has n arguments. In general, two members of 
such a family for distinct input sizes m and n need not be related. However, 
we shall see that in the most interesting cases they are related, and sometimes 
can be computed using one generic formula. 

In the next section we study some generic properties of aggregation func- 
tions and extended aggregation functions. Generally a given property holds 
for an extended aggregation function F if and only if it holds for every member 
of the family fn. 


4 Sometimes extended aggregation functions are also referred to as aggregation 
operators, see footnote 2] 
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Examples 


1 
Arithmetic mean f,(x) = 7 (a1 +a9+...+2n). 


Geometric mean f(x) = T122... En. 


n 
Harmonic mean fn(x) = =——-— 
Minimum min(x) = min{a1,..., 2}. 
Maximum max(x) = max{,..., En} 
n 


Product TnK) = 2102s. Cy = Il Ti. 
i=1 


Bounded sum f(x) = min{1, 5 xi}. 


i=l 


Note that in all mentioned examples we have extended aggregation func- 
tions, since the above generic formulae are valid for any n > 1. 


1.2 What are aggregation functions used for? 


Consider the following prototypical situations. We have several criteria with 
respect to which we assess different options (or objects), and every option 
fulfills each criterion only partially (and has a score on [0, 1] scale). Our aim is 
to evaluate the combined score for each option (possibly to rank the options). 

We may decide to average all the scores. This is a typical approach in 
sports competitions (like artistic skating) where the criteria are scores given 
by judges. The total score is the arithmetic mean. 


1 
F(t1,+--,€n) = (a1 +2 +... + Tn). 


We may decide to take a different approach: low scores pull the overall score 
down. The total score will be no greater than the minimum individual criterion 
score. This is an example of conjunctive behavior. Conjunctive aggregation 
functions are suitable to model conjunctions like 


If cis A AND yis B AND zis C THEN ... 


where A, B and C are the criteria against which the parameters x,y,z are 
assessed. For example, this is how they choose astronauts: the candidates 
must fulfill this and that criteria, and having an imperfect score in just one 
criterion moves the total to that imperfect score, and further down if there is 
more than one imperfect score. 
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Yet another approach is to let high scores push each other up. We start 
with the largest individual score, and then each other nonzero score pushes 
the total up. The overall score will be no smaller than the maximum of all 
individual scores. This is an example of disjunctive behavior. Such aggregation 
functions model disjunction in logical rules like 


If cis AOR yis B OR zis C THEN ... 


For example, consider collecting evidence supporting some hypothesis. Having 
more than one piece of supporting evidence makes the total support stronger 
than the support due to any single piece of evidence. A simple example may 
be: if you have fever you may have a cold. If you cough and sneeze, you may 
have a cold. But if you have fever, cough and sneeze at the same time, you 
are almost certain to have a cold. 

It is also possible to think of an aggregation scheme where low scores pull 
each other down and high scores push each other up. We need to set some 
threshold to distinguish low and high scores, say 0.5. Then an aggregation 
function will have conjunctive behavior when all x; < 0.5, disjunctive behavior 
when all x; > 0.5, and either conjunctive, disjunctive or averaging behavior 
when there are some low and some high scores. 

One typical use of such aggregation functions is when low scores are inter- 
preted as “negative” information, and high scores as “positive” information. 
Sometimes scientists also use a bipolar scale ({[—1, 1] instead of [0,1], in which 
case the threshold is 0). 


Multiple attribute decision making 


In problems of multiple attribute decision making (sometimes interchangeably 
called multiple criteria decision making (55), 132, 133}, an alternative (a 
decision) has to be chosen based on several, usually conflicting criteria. The 
alternatives are evaluated by using attributes, or features, which are expressed 
numerically. For example, when buying a car, the attributes are usually the 
price, quality, fuel consumption, size, power, brand (a nominal attribute), etc. 
In order to choose the best alternative, one needs to combine the values of the 
attributes in some way. One popular approach is called Multi-Attribute Utility 
Theory (MAUT). It assigns a numerical score to each alternative, called its 
utility u(a1,...,@n). Values a; denote the numerical scores of each attribute 
i. Note that a bigger value of a; does not imply “better”, for example when 
choosing a car, one may prefer medium-sized cars. 

The basic assumption of MAUT is that the total utility is a function of 
individual utilities x; = u;(a;), i.e., u(a) = u(ur(a1),...,Un(an)) = u(x). The 


involves multiple objective decision making 
6 Of course, sometimes attributes are qualitative, or ordinal. They may be converted 
to a numerical scale, e.g., using utility functions. 


5 Multiple criteria decision making, besides multiple attribute decision making, also 
[133 2 
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individual utilities depend only on the corresponding attribute a;, and not on 
the other attributes, although the attributes can be correlated of course. This 
is a simplifying assumption, but it allows one to model quite accurately many 
practical problems. 

The rational decision-making axiom implies that one cannot prefer an 
alternative which differs from another alternative in that it is inferior with 
respect to some individual utilities, but not superior with respect to the 
other ones. Mathematically, this means that the function u is monotone non- 
decreasing with respect to all arguments. If we scale the utilities to [0,1], and 
add the boundary conditions u(0,...,0) = 0, u(1,...,1) = 1, we obtain that 
u is an aggregation function. 

The most common aggregation functions used in MAUT are the additive 
and multiplicative utilities, which are the weighted arithmetic and geomet- 
ric means (see p. [24). However disjunctive and conjunctive methods are also 
popular a Compensation is an important notion in MAUT, expressing the 
concept of trade-off. It implies that the decrease of the total utility u due to 
some attributes can be compensated by the increase due to other attributes. 
For example, when buying an item, the increase in price can be compensated 
by the increase in quality. 


Group decision making 


Consider a group of n experts who evaluate one (or more) alternatives. Each 
expert expresses his/her evaluation on a numerical scale (which is the strength 
of this expert’s preference). The goal is to combine all experts’ evaluations 
into a single score [36, [s7, . By scaling the preferences to [0,1] we obtain 
a vector of inputs x = (#1,...,2n), where 2; is the degree of preference of the 
i-th expert. The overall evaluation will be some value y = f(x). 

The most commonly used aggregation method in group decision making 
is the weighted arithmetic mean. If experts have different standing, then their 
scores are assigned different weights. However, experts’ opinions may be cor- 
related, and one has to model their interactions and groupings. Further, one 
has to model such concepts as dictatorship, veto, oligarchies, etc. 

From the mathematical point of view, the function f needs to be mono- 
tone non-decreasing (the increase in one expert’s score cannot lead to the 
decrease of the overall score) and satisfy the boundary conditions. Thus f is 
an aggregation function. 


Fuzzy logic and rule based systems 


In fuzzy set theory [280], membership of objects in fuzzy sets are numbers 
from [0, 1]. Fuzzy sets allow one to model vagueness and uncertainty which are 
very often present in natural languages. For example the set “ripe bananas” is 
fuzzy, as there are obviously different degrees of ripeness. Similarly, the sets of 
“small numbers”, “tall people”, “high blood pressure” are fuzzy, since there is 
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no clear cutoff which discriminates objects between those that are in the set 
and those that are not. An object may simultaneously belong to a fuzzy set 
and its complement. A fuzzy set A defined on a set of objects X is represented 
by a membership function ua : X — [0,1], in such a way that for any object 
x € X the value a(x) measures the degree of membership of x in the fuzzy 
set A. 

The classical operations of fuzzy sets, intersection and union, are based 
on the minimum and maximum, i.e., Wang = Min{ 4A, HB}, HAUB = Max 
{ua, uB}. However other aggregation functions, such as the product, have 
been considered almost since the inception of fuzzy set theory . Nowadays 
a large class of conjunctive and disjunctive functions, the triangular norms 
and conorms, are used to model fuzzy set intersection and union. 

Fuzzy set theory has proved to be extremely useful for solving many real 
world problems, in which the data are imprecise, e.g., [s1] lsd, el 234 Bes] 
[286]. Fuzzy control in industrial systems and consumer electronics is one no- 
table example of the practical applications of fuzzy logic. 

Rule based systems, especially fuzzy rule based systems (285, involve ag- 
gregation of various numerical scores, which correspond to degrees of satisfac- 
tion of rule antecedents. A rule can be a statement like 


If cis A AND yis B AND zis C THEN some action 


The antecedents are usually membership values of x in A, y in B, etc. The 
strength of “firing” the rule is determined by an aggregation function that 
combines membership values f(ua(x), we(y), no(z)). 

There are many other uses of aggregation functions, detailed in the refer- 
ences at the end of this Chapter. 


1.3 Classification and general properties 


1.3.1 Main classes 


As we have seen in the previous section, there are various semantics of ag- 

gregation, and the main classes are determined according to these semantics. 

In some cases we require that high and low inputs average each other, in 

other cases aggregation functions model logical connectives (disjunction and 

conjunction), so that the inputs reinforce each other, and sometimes the be- 

havior of aggregation functions fated on the inputs. The four main classes 
81), 


of aggregation functions are [43] [ba [36, 


Averaging, 
Conjunctive, 
Disjunctive, 
Mixed. 
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Definition 1.7 (Averaging aggregation). An aggregation function f has 
averaging behavior (or is averaging) if for every x it is bounded by 


min(x) < f(x) < max(x). 


Definition 1.8 (Conjunctive aggregation). An aggregation function f 
has conjunctive behavior (or is conjunctive) if for every x it is bounded by 


f(x) < min(x) = min(71,2%2,...,@n). 


Definition 1.9 (Disjunctive aggregation). An aggregation function f has 
disjunctive behavior (or is disjunctive) if for every x it is bounded by 


f(x) > max(x) = max(21,%2,...,2n). 


Definition 1.10 (Mixed aggregation). An aggregation function f is mixed 
if it does not belong to any of the above classes, i.e., it exhibits different types 
of behavior on different parts of the domain. 


1.3.2 Main properties 


Definition 1.11 (Idempotency). An aggregation function f is called idem- 
potent if for every input x = (t,t,...,t),t € [0,1] the output is f(t,t...,t) =t. 


Note 1.12. Because of monotonicity of f, idempotency is equivalent to averaging 
behavior[] 

The aggregation functions minimum and maximum are the only two func- 
tions that are at the same time conjunctive (disjunctive) and averaging, and 
hence idempotent. 


Example 1.13. The arithmetic mean is an averaging (idempotent) aggregation 
function 


f(x) = (a tat... + tn). 


Example 1.14. The geometric mean is also an averaging (idempotent) aggre- 
gation function 
f(x) = 4an 


T Proof: Take any x € [0,1]", and denote by p = min(x),q = max(x). By mono- 
tonicity, p = f(D,P,-.-,D) < F(X) < Flag- 4) = q. Hence min(x) < f(x) < 
max(x). The converse: let min(x) < f(x) < max(x). By taking x = (t,t,...,t), 
min(x) = max(x) = f(x) = t, hence idempotency. 
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Example 1.15. The product is a conjunctive aggregation function 


nm 
f(x) = [[« = T1T2 . . . En. 
i=l 


Definition 1.16 (Symmetry). An aggregation function f is called symmet- 
ric, if its value does not depend on the permutation of the arguments, i.e., 


f (tists, ee En) = F(p) TPO), ers Spi) 
for every x and every permutation P = (P(1), P(2),...,P(n)) of (1,2...,n). 


The semantical interpretation of symmetry is anonymity, or equality. For 
example, equality of judges in sports competitions: all inputs are treated 
equally, and the output does not change if the judges swap seats] On the 
other hand, in shareholders meetings the votes are not symmetric as they 
depend on the number of shares each voter has. 


Example 1.17. The arithmetic and geometric means and the product in Ex- 
amples LI3[.I5]are symmetric aggregation functions. A weighted arithmetic 
mean with non-equal weights w1, w2, ... Wn, that are non-negative and add to 
one is not symmetric, 





n 
f(x) = X wizi = W121 + We%Q +... + WnTn. 
i=l 


Permutation of arguments is very important in aggregation, as it helps 
express symmetry, as well as to define other concepts. A permutation of 
(1,2...,5) is just a tuple like (5,3,2, 1,4). There aren! =1x2x3x...xn 


possible permutations of (1,2,..., n). 
We will denote a vector whose components are arranged in the order given 
by a permutation P by xp = (pi), £pa),---,P(m))- In our example, xp = 


(£5, £3, £2, £1, £4). We will frequently use the following special permutations 
of the components of x. 


Definition 1.18. We denote by x» the vector obtained from x by arranging 
its components in non-decreasing order, that is, x,» = xp where P is the 
permutation such that tpi) < Tp) S... < EPn): 

Similarly, we denote by x~, the vector obtained from x by arranging its 
components in non-increasing order, that is X\, = xp where P is the permu- 
tation such that xpi) = pi) È ... = Pin): 


8 It is frequently interpreted as anonymity criterion: anonymous ballot papers can 
be counted in any order. 
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Note 1.19. In fuzzy sets literature, the notation x0) = (x(1),-..,2(n)) is often used 
to denote both x> and xx, depending on the context. 


Note 1.20. We can express the symmetry property by an equivalent statement that 
for every input vector x 


f(x) = f(x) (or f(x) = FE), 


rather than f(x) = f(xp) for every permutation. This gives us a shortcut for calcu- 
lating the value of a symmetric aggregation function for a given x by using sort () 
operation (see Fig. LI). 


#include<algorithm> 
struct Greaterthan { 

bool operator() (const double& a, const double& b) {return a > b; } 
} greaterthan; /* required to specify the user’s sorting order */ 


double f_symm(int n, double * x) 


{ 


sort( &x[0], &x[n], greaterthan); /* sorted in decreasing order */ 
/* evaluate f for x sorted in decreasing order */ 
return f(n,x); 
/* to sort in increasing order, define the structure Lessthan 
in an analogous way. */ 





Fig. 1.1. A C++ code for evaluation of a symmetric aggregation function. f(x) is 
defined for x\,, f-symm(x) returns a correct value for any x. 


Definition 1.21 (Strict monotonicity). An aggregation function f is 
strictly monotone increasing if 


x<y butx Fy implies f(x) < fly) for every x,y € [0,1]”. (1.3) 


Note 1.22. Notice the difference between [x < y, x # y] and x < y. The latter 
implies that for all components of x and y we have x; < yi, whereas the former 
means that at least one component of y is greater than that of x, i.e., Ji such that 
zi < yi and Vj : £j < yj. 

Strict monotonicity is a rather restrictive property. Note that there are 
no strictly monotone conjunctive or disjunctive aggregation functions. This 
is because every conjunctive function coincides with min(x) for those x that 
have at least one zero component, and min is not strictly monotone (similarly, 
disjunctive aggregation functions coincide with max(x) for those x that have 
at least one component x; = 1). However, strict monotonicity on the semi-open 
set ]0,1]” (respectively [0, 1[”) is often considered for conjunctive (disjunctive) 
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aggregation functions, see Chapter B] Of course, there are plenty of strictly 
increasing averaging aggregation functions, such as arithmetic means. 

It is often the case that when an input vector contains a specific value, 
this value can be omitted. For example, consider a conjunctive aggregation 
function f to model the rule for buying bananas 


If priceis LOW AND bananais RIPE THEN buy it. 


Conjunction means that we only want cheap and ripe bananas, but both are 
matters of degree, expressed on [0,1] scale. If one of the arguments x; = 1, 
e.g., we find perfectly ripe bananas, then the outcome of the rule is equal to 
the degree of our satisfaction with the price. 


Definition 1.23 (Neutral element). An aggregation function f has a neu- 
tral element e € [0,1], if for every t € [0,1] in any position it holds 


f(e,...,¢,¢,e,...,e) =. 


For extended aggregation functions, we have a stronger version of this 
property, which relates aggregation functions with a different number of ar- 
guments. 


Definition 1.24 (Strong neutral element). An extended aggregation 
function F has a neutral element e € [0,1], if for every x with x; = e, for 
some 1 <i<n, and every n > 2, 


Falt iers tiare City. ey) = Fail Oty seg iy ks es vibes 


When n = 2, we have f(t,e) = f(e,t) = t. Then by iterating this property 
we obtain as a consequence that every member fn of the family has the neutral 
element e, i.e., 

A TE E A E EE 


for t in any position. 


Note 1.25. A neutral element, if it exists, is unique p. It can be any number from 
[0, 1]. 


Note 1.26. Observe that if an aggregation function f has neutral element e = 1 
(respectively e = 0) then f is necessarily conjunctive (respectively disjunctive). 
Indeed, if f has neutral element e = 1, then by monotonicity it is f(£1,..., £n) < 
fl, ...,1, £:,1,...,1) = zx; for any i € {1,...,n}, and this implies f < min (the 
proof for the case e = 0 is analogous). 


° Proof: Assume f has two neutral elements e and u. Then u = f(e, u) = e, therefore 
e = u. For n variables, assume e < u. By monotonicity, e = f(e,u,...,u,...,u) > 
f(e,e,...,€,u,e...,e) = u, hence we have a contradiction. The case e > u leads 
to a similar contradiction. 
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Note 1.27. The concept of a neutral element has been recently extended to that of 
neutral tuples, see Rg. 


Example 1.28. The product function f(x) = ] | z; has neutral element e = 1. 
Similarly, min function has neutral element e = 1 and max function has 
neutral element e = 0. The arithmetic mean does not have a neutral element. 
We shall see later on that any triangular norm has e = 1, and any triangular 
conorm has e = 0. 


It may also be the case that one specific value a of any argument yields 
the output a. For example, if we use conjunction for aggregation, then if any 
input is 0, then the output must be 0 as well. In the banana buying rule above, 
if the banana is green, we do not buy it at any price. 


Definition 1.29 (Absorbing element (annihilator)). An aggregation 
function f has an absorbing element a € [0,1] if 


f(a1,---,%i-1,@, Dig1,---,2n) =a, 
for every x such that x; =a with a in any position. 


Note 1.80. An absorbing element, if it exists, is unique. It can be any number from 
(0, 1]. 


Example 1.31. Any conjunctive aggregation function has absorbing element 
a = 0. Any disjunctive aggregation function has absorbing element a = 1. 
This is a simple consequence of the Definitions [8] and [L9] Some averaging 
functions also have an absorbing element, for example the geometric mean 


Pa 1/n 
f(x) = (Ù zi) 


has the absorbing element a = 0. 


Note 1.32. An aggregation function with an annihilator in ]0, 1[ cannot have a neu- 
tral element!°}, But it may have a neutral element if a = 0 or a= 1. 


Note 1.83. The concept of an absorbing element has been recently extended to that 
of absorbing tuples, see pg. 


10 Proof: Suppose a €]0, 1[ is the absorbing element and e € [0,1] is the neutral ele- 
ment. Then if a < e, we get the contradiction a = 0, since it is a = f(a,...,a,0) < 
f(e,...,e,0) = 0. Similarly, if a > e then a = f(a,...,a,1) > f(e,...,e,1) = 1. 
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Definition 1.34 (Zero divisor). An element a €]0,1[ is a zero divisor of 
an aggregation function f if for alli € {1,...,n} there exists some x €]0, 1]” 
such that its i-th component is xi = a, and it holds f(x) = 0, i.e., the equality 


f(x, vee) Uj-1,4,%j41,.-- En) = 0, 
can hold for some x > 0 with a at any position. 


Note 1.35. Because of monotonicity of f, if a is a zero divisor, then all values b €]0, a] 
are also zero divisors. 


The interpretation of zero divisors is straightforward: if one of the inputs 
takes the value a, or a smaller value, then the aggregated value could be zero, 
for some x. So it is possible to have the aggregated value zero, even if all the 
inputs are positive. The largest value a (or rather an upper bound on a) plays 
the role of a threshold, the lower bound on all the inputs which guarantees a 
non-zero output. That is, if b is not a zero divisor, then f(x) > 0, if all x; > b. 


Example 1.36. Averaging aggregation functions do not have zero divisors. But 
the function f(z1ı, £2) = max{0, zı + £2 — 1} has a zero divisor a = 0.999, 
which means that the output can be zero even if any of the components zı 
or X2 is as large as 0.999, provided that the other component is sufficiently 
small. However, 1 is not a zero divisor. 


Zero divisors exist for aggregation functions that exhibit conjunctive be- 
havior, at least on parts of their domain, i.e., conjunctive and mixed aggre- 
gation functions. For disjunctive aggregation functions we have an analogous 
definition. 


Definition 1.37 (One divisor). An element a €]0, 1| is a one divisor of an 
aggregation function f if for alli =1,...,n there exists some x € [0,1[" such 
that its i-th component is xi =a and it holds f(x) = 1, i.e., the equality 


f(x, vee) Ci—1; Q, Ti+1;--. Ën) = 1, 
can hold for some x < 1 with a at any position. 


The interpretation is similar: the value of any inputs larger than a can 
make the output f(x) = 1, even if none of the inputs is actually 1. On the 
other hand, if b is not a one divisor, then the output cannot be one if all the 
inputs are no larger than b. 

The following property is useful for construction of n-ary aggregation func- 
tions from a single two-variable function. 


Definition 1.38 (Associativity). A two-argument function f is associative 
if f(f (#1, £2), £3) = f (£1, f(v2,%3)) holds for all x1, £2, £3 in its domain. 
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Consequently, the n-ary aggregation function can be constructed in a 
unique way by iteratively applying fo as 


fn(@1, Sans san) => fo(fol. rs falx, £2), £3), sws sin): 


Thus bivariate associative aggregation functions univocally define extended 
aggregation functions. 


Example 1.39. The product, minimum and maximum are associative aggrega- 
tion functions. The arithmetic mean is not associative. 


Associativity simplifies calculation of aggregation functions, and it effec- 
tively allows one to easily aggregate any number of inputs, as the following 
code on Fig. L2]illustrates. It is not the only way of doing this (for instance 
the arithmetic or geometric means are also easily computed for any number 
of inputs). 

Another construction gives what are called recursive extended aggregation 
functions by Montero (63. It involves a family of two-variate functions fy,n = 
PAS ag 


Definition 1.40 (Recursive extended aggregation function). An ez- 
tended aggregation function F is recursive by Montero if the members fn are 
defined from a family of two-variate aggregation functions fy recursively as 


fr(a1, eee yn) = Jo n= (fis age si) Tn), 
starting with fo = f2. 


Each extended aggregation function built from an associative bivariate 
aggregation function is recursive by Montero, but the converse is not true. 


Example 1.41. Define f3'(t1,t2) = (arta tte Then f,(x) = + >> a, the 
i=1 


arithmetic mean (which is not associative). 


Definition 1.42 (Decomposable extended aggregation function). An 
extended aggregation function F is decomposable if for all m,n = 1,2,... and 
for all x € [0,1]™, y € [0,1]”: 





Freer (Biers Urs Ys Un) = (1.4) 
Sman(fm(21, tee sEm) - -s fm(21, tee ee ese . sg thn) 
ee 


m times 
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/* Input: the vector of arguments x, two-variable function f 
Output: the value f_n(x).*/ 

// Recursive method 
double fn(int n, double[] x, (double)f(double x, double y)) 


{ 


if (m==2) return f(x[0],x[1]); 
else return f(fn(n-1,x,f),x[m-1]); 


} 


// Non-recursive method 
double fn(int n, double[] x, (double)f(double x, double y)) 
{ 

double s=f(x[0],x[1]); 

for (i=2;i<n;it+) s=f(s,x[i]); 

return s; 





Fig. 1.2. Recursive and non-recursive calculation of an associative function. 


A continuous decomposable extended aggregation function is always idem- 
potent. 

Another useful property, which generalizes both symmetry and associa- 
tivity, and is applicable to extended aggregation functions, is called bisym- 
metry. Consider the situation in which m jurymen evaluate an alternative 
with respect to n criteria. Let xij, i =1,...,m,j =1,...,n denote the score 
given by the i-th juryman with respect to the j-th criterion. To compute 
the global score fimn(@11,---;21n,---;%mn) we can either evaluate the scores 
given by the i-th juryman, yi = fn(£i1,..., Zin), and then aggregate them 
as z= fm(Y1,---,Ym), or, alternatively, aggregate scores of all jurymen with 
respect to each individual criterion j, i.e., compute 9; = fim(@1j,---,2mj); 
and then aggregate these scores as Z = fn(91,---,Yn). The third alternative 
is to aggregate all the scores by an aggregation function fmn(X). 

This is illustrated in Table [J] We can either aggregate scores in each 
row, and then aggregate the totals in the last column of this table, or we can 
aggregate scores in each column, and then aggregate the totals in the last row, 
or aggregate all scores at once. The bisymmetry property simply means that 
all three methods lead to the same answer. 


Definition 1.43 (Bisymmetry). An extended aggregation function F is 
bisymmetric if for all m,n =1,2,... and for all x € [0,1]: 


Fn) = In Sr iiis Bin) 025 fal Crips tn) (1.5) 


= Fal fmt . a Emi) . -i Jm (Tin tae nin) 
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Table 1.1. The table of scores to be aggregated by m jurymen with respect to n 


criteria. 
furor yao 1 [2] 3] [| »] Total 
yı 


T11 Lin 


£21 L2n Y2 
31 X3n 


Tan 





Note 1.44. A symmetric associative extended aggregation function is bisymmetric. 
However there are symmetric and bisymmetric non-associative extended aggregation 
functions, e.g., the arithmetic and geometric means. The extended aggregation func- 
tion defined by f(x) = x1 (projection to the first coordinate) is bisymmetric and as- 
sociative but not symmetric. The extended aggregation function f(x) = (X; z)? 
(square of the arithmetic mean) is symmetric but neither associative nor bisymmet- 
ric. Every continuous associative extended aggregation function is bisymmetric, but 


not necessarily symmetric. 


Let us finally mention two properties describing the stability of aggregation 
functions with respect to some changes of the scale: 


Definition 1.45 (Shift-invariance). An aggregation function f : [0,1]" > 
(0, 1] is shift-invariant (or stable for translations) if for all A € [—1,1] and for 
all (£1,..., £n) € [0,1]” it is 


f(ar +A,..., £n +A) = f(£1,..., En) + 


whenever (x1 + À, ..., 8n +2) € [0,1]” and f(£1,..., £n) +A € [0,1]. 


Definition 1.46 (Homogeneity). An aggregation function f : [0,1]"” > 
[0, 1] is homogeneous if for all A € [0,1] and for all (x1,..., £n) € [0,1]” it is 


F (A@is 20s gABH) = AF (hes Tn): 


Aggregation functions which are both shift-invariant and homogeneous are 
known as linear aggregation functions. 

Note that, due to the boundary conditions f(0,...,0) = 0 and f(1,...,1) = 
1, either shift-invariant, homogeneous or linear aggregation functions are nec- 
essarily idempotent, and thus (see Note [L.I2) they can only be found among 
averaging functions. A prototypical example of a linear aggregation function 
is the arithmetic mean. 
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1.3.3 Duality 


It is often useful to draw a parallel between conjunctive and disjunctive aggre- 
gation functions, as they often satisfy very similar properties, just viewed from 
a different angle. The concept of a dual aggregation function helps with map- 
ping most properties of conjunctive aggregation functions to disjunctive ones. 
So essentially one studies conjunctive functions, and obtains the correspond- 
ing results for disjunctive functions by duality. There are also aggregation 
functions that are self-dual. 
First we need the concept of negation. 


Definition 1.47 (Strict negation). A univariate function N defined on 
(0, 1] is called a strict_negation, if its range is also [0,1] and it is strictly 
monotone decreasing. 


Definition 1.48 (Strong negation). A univariate function N defined on 
(0, 1] is called a strong negation, if it is strictly decreasing and involutive (i.e., 
N(N(t)) =t for all t € [0,1]). 


Example 1.49. The most commonly used strong negation is the standard nega- 
tion N(t) = 1 — t. We will use it throughout this book. Another example of 
negation is N(t) = 1 — t?, which is strict but not strong. 


Note 1.50. A strictly monotone bijection is always continuous. Hence strict and 
strong negations are continuous. 


Despite its simplicity, the standard negation plays a fundamental role in 
the construction of strong negations, since any strong negation can be built 
from the standard negation using an automorphism] of the unit interval [239): 


Theorem 1.51. A function N : [0,1] — [0,1] is a strong negation if and 
only if there exists an automorphism y : [0,1] — [0,1] such that N = Ny = 
yp 1o(1—Id) oy, ie. N(t) = N,(t) =~ 1(1 — v(t) for any t € [0,1]. 


Example 1.52. Let us construct some strong negations: 
e With y(a) =a* (A> 0): 
N,(t) = (1 — ey? 


(note that the standard negation is recovered with A = 1); 


11 A frequently used term is bijection: a bijection is a function f : A —> B, such that 
for every y € B there is exactly one x € A, such that y = f(x), i.e., it defines 
a one-to-one correspondence between A and B. Because N is strictly monotone, 
it is a one-to-one function. Its range is [0,1], hence it is an onto mapping, and 
therefore a bijection. 

12 Automorphism is another useful term: An automorphism is a strictly increasing 
bijection of an interval onto itself [a,b] — [a, b]. 
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e With y(a) =1-—(1-a) (A> 0): 
Not) =1-[L-(-1y'" 


(the standard negation is recovered with À = 1); 
e With pla) = ae (A > 0): 


_ 0-2) 
Na = t+ d2(1—t) 


(again, the standard negation is obtained with A = 1); 


e With yla) = BORD (A> -1,a>0): 


In(1+A) ? 
ie" 
a (5) 


Note that taking a = 1 we get the family Ny(t) = #54, which is known 
as the Sugeno’s family of strong negations (which includes, when A = 0, 


the standard negation). 





Nor()= l-t 

N,A)= -t 
Nos()= TETEN, 
N0 = (1 i ) 
Nps) = = 


3t 





t 


Fig. 1.3. Graphs of some strong negations in Example[L52]with a fixed parameter A. 
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Note 1.53. The characterization given in Theorem allows one to easily show 
that any strong negation N has a unique fixed point, i.e., there exists one and only 
one value in [0,1], which we will denote ty, verifying N(tn) = tn. Indeed, since 
N = Ng for some automorphism y, the equation N(tn) = tn is equivalent to 
y'(1— y(tn)) = tn, whose unique solution is given by ty = y~'(1/2). Note that, 
obviously, it is always ty #0 and tn #1. 


Definition 1.54 (Dual aggregation function). Let N : [0,1] — [0,1] be 
a strong negation and f : [0,1]" — [0,1] an aggregation function. Then the 
aggregation function fa given by 


fa(@1,---,%n) = N(f(N (21), N(a2),..-,N(an))) 


is called the dual of f with respect to N, or, for short, the N-dual of f. When 
using the standard negation, fa is given by 


fa(@1,.--,%) = 1 — f(l— z1,...,1— £n) 
and we will simply say that fa is the dual of f. 


It is evident that the dual of a conjunctive aggregation function is disjunc- 
tive, and vice versa, regardless of what strong negation is used. Some functions 
are self-dual. 


Definition 1.55 (Self-dual aggregation function). Given a strong nega- 
tion N, an aggregation function f is self-dual with respect to N (for short, 
N-self-dual or N-invariant), if 


where N(x) = (N(zx1),..., N(£n)). For the standard negation we have 
f(x)=1- f0- x), 
and it is simply said that f is self-dual. 


For example, the arithmetic mean is self-dual. We study N-self-dual ag- 
gregation functions in detail in Chapter [4] It is worth noting that there are 
no N-self-dual conjunctive or disjunctive aggregation functions. 


1.3.4 Comparability 


Sometimes it is possible to compare different aggregation functions and es- 
tablish a certain order among them. We shall compare aggregation functions 
pointwise, i.e., for every x € [0,1]”. 
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Definition 1.56. An aggregation function f is stronger than another aggre- 
gation function of the same number of arguments g, if for all x € [0,1]": 


g(x) < f(x). 


It is expressed as g < f. When f is stronger that g, it is equivalently said that 
g is weaker than f. 


Not all aggregation functions are comparable. It may happen that f is stronger 
than g only on some part of the domain, and the opposite is true on the rest 
of the domain. In this case we say that f and g are incomparable. 


Example 1.57. The strongest conjunctive aggregation function is the mini- 
mum, and the weakest disjunctive aggregation function is the maximum (see 
Definitions [L8]and[L.9). Any disjunctive aggregation function is stronger than 
an averaging function, and any averaging function is stronger than a conjunc- 
tive one. 


1.3.5 Continuity and stability 


We will be mostly interested in continuous aggregation functions, which intu- 
itively are such functions that a small change in the input results in a small 
change in the output. There are some interesting aggregation functions 
that are discontinuous, but from the practical point of view continuity is very 
important for producing a stable output. 

The next definition is an even stronger continuity requirement. The rea- 
son is that simple, or even uniform continuity is not sufficient to distinguish 
functions that produce a “small” change in value due to a small change of the 
argument [4 The following definition puts a bound on the actual change in 
value due to changes in the input. 


13 A real function of n arguments is continuous if for any sequences {xij}, i = 
1,...,n such that lim a; = yi it holds lim f(1j,..-¢nj) = f(y1,---,Yn). Be- 
j-oco 


j-co 

cause the domain [0,1]” is a compact set, continuity is equivalent to its stronger 
version, uniform continuity. For monotone functions we have a stronger result: 
an aggregation function is uniformly continuous if and only if it is continuous in 
each argument (i.e., we can check continuity by fixing all variables but one, and 
checking continuity of each univariate function. However, general non-monotone 
functions can be continuous in each variable without being continuous). 

Think of this: a discontinuous (but integrable) function can be approximated ar- 
bitrarily well by some continuous function (e.g., a polynomial). Thus based on 
their values, or graphs, we cannot distinguish between continuous and discontin- 
uous integrable functions, as the values of both functions coincide up to a tiny 
difference (which we can make as small as we want). A computer will not see 
any difference between the two types of functions. Mathematically speaking, the 
subset of continuous functions C({2) is dense in the set of integrable functions 
L*(Q) on a compact set. 


14 
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Definition 1.58 (Lipschitz continuity). An aggregation function f is called 
Lipschitz continuous if there is a positive number M, such that for any two 
vectors x,y in the domain of definition of f: 


If(x) — fly)| < Md(x,y), (1.6) 


where d(x,y) is a distance between x and y E The smallest such number M 
is called the Lipschitz constant of f (in the distance d). 


Typically the distance is the Euclidean distance between vectors, 





d(x,y) = (xı yı)? H (x2 y2) F.. (En Yn), 


but it can be chosen as any norm d(x,y) = ||x — y||ES typically it is chosen 
n 


1/p 
as a p-norm. A p-norm, p > 1 is a function ||x||, = (È e) , for finite 


i=1 
p, and ||x||.. = max |x|. 
Sla 

Thus, if the change in the input is ô = ||x—y||, then the output will change 
by at most Mô. Hence M can be interpreted as the upper bound on the rate 
of change of a function. If a function f is differentiable, then M is simply 
the upper bound on the norm of its gradient. All differentiable functions are 
necessarily Lipschitz-continuous, but not vice versa. However, any Lipschitz 
function is differentiable “almost” everywherd!4 

We pay attention to the rate of change of a function because of the ever 
present input inaccuracies. If the aggregation function receives an inaccurate 
input X = (x1 + 61,...,% + Ôn), contaminated with some noise (61,...,5n), 
we do not expect the output f(X) to be substantially different from f(x). The 
Lipschitz constant M bounds the factor by which the noise is magnified. 


Note 1.59. Since f(0) = 0 and f(1) = 1, the Lipschitz constant of any aggregation 
function is M > 1/||1||. For p-norms we have ||1|| = Y/n- 1 < 1, that is M > n™™?, 
so in principle M can be smaller than 1. 


Definition 1.60 (p-stable aggregation functions). Given p > 1, an ag- 
gregation function is called p-stable if its Lipschitz constant in the p-norm 
||- Ilp is 1. An extended aggregation function is p-stable if it can be represented 
as a family of p-stable aggregation functions. 


15 A distance between objects from a set S is a function defined on S x S, whose 
values are non-negative real numbers, with the properties: 1) d(x,y) = 0 if and 
only if x = y, 2) d(x,y) = d(y,x), and 3) d(x,z) < d(x,y) + d(y,z) (triangular 
inequality). Such distance is called a metric. 

16 A norm is a function f on a vector space with the properties: 1) f(x) > 0 for all 
nonzero x and f(0) =0 , 2) f(ax) = |a| f(x), and 3)f(x +y) < f(x) + fly). 

17 Le., it is differentiable on its entire domain, except for a subset of measure zero. 
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Evidently, p-stable aggregation functions do not enhance input inaccura- 
cies, as | f(X) — f(x)| < |X — xllp = [I6llp- 


Definition 1.61 (1-Lipschitz aggregation functions). An aggregation 
function f is called 1-Lipschitz if it is p-stable with p = 1, i.e., for all x,y: 





lf) = Fly) < lea = yi | + |£2 = y2| + +. e+ |En = Yn]. 


Definition 1.62 (Kernel aggregation functions). An aggregation func- 
tion f is called kernel if it is p-stable with p = ov, i.e., for all x,y: 


lf) — Ay) S max |e — vel. 


For kernel aggregation functions, the error in the output cannot exceed 
the largest error in the input vector. 


Note 1.63. If an aggregation function is p-stable for a given p > 1, then it is also 
g-stable for any 1 < q < p. This is because ||x||p < ||x||q for all x. 


Example 1.64. The product, minimum and maximum are p-stable extended 
aggregation functions for any p. The arithmetic mean is also p-stable for any 
p. The geometric mean is not Lipschitz, although it is continuous [8] 


1.4 Main families and prototypical examples 


1.4.1 Min and Max 


The minimum and maximum functions are the two main aggregation func- 
tions that are used in fuzzy set theory and fuzzy logic. This is partly due to 
the fact that they are the only two operations consistent with a number of 
set-theoretical properties, and in particular mutual distributivity (33). These 
connectives model fuzzy set intersection and union (or conjunction and dis- 
junction). 

They are defined for any number of arguments as 


min(x) = min Pis (1.7) 
max(x) = max Ti. (1.8) 


18 Take f(v1,%2) = ./£1%2, which is continuous for 71,22 > 0, and let x2 = 1. 
f(t,1) = vt is continuous but not Lipschitz. To see this, let t = 0 and u > 0. 
Then |v0 — vu| = yu > Mu = M0 — al, or u72 > M, for whatever choice of 
M, if we make u sufficiently small. Hence the Lipschitz condition fails. 
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The minimum and maximum are conjunctive and disjunctive extended 
aggregation functions respectively, and simultaneously limiting cases of aver- 
aging aggregation functions. 

Both minimum and maximum are symmetric and associative, and 
Lipschitz-continuous (in fact kernel aggregation functions). The min function 
has the neutral element e = 1 and the absorbing element a = 0, and the max 
function has the neutral element e = 0 and the absorbing element a = 1. They 
are dual to each other with respect to the standard negation N(t) = 1 — t 
(and in fact, any strong negation N) 


max(x) = 1 — min(1 — x) = 1 — min (1 — zi), 


max(x) = N(min(N(x)) = N( min (N(2i)), 


=a laa 


Most classes and parametric families of aggregation functions include max- 
imum and minimum as members or as the limiting cases. 


1.4.2 Means 


Means are averaging aggregation functions. Formally, a mean is simply a func- 
tion f with the property 0 


min(x) < f(x) < max(x). 


Still there are other properties that define one or another family of means. 
We discuss them in Chapter P] 


Definition 1.65 (Arithmetic mean). The arithmetic mean is the function 


1 1 n 
M(x) = z(a +r2 +... + Tn) = -D t 
i=1 


Definition 1.66 (Weighting vector). A vector w = (w1,..., Wn) is called 
n 


a weighting vector if wi € [0,1] and > w; = 1. 
i=1 


Definition 1.67 (Weighted arithmetic mean). Given a weighting vector 
w, the weighted arithmetic mean is the function 


m 
Mw(Xx) = wit, + wotg +... + WnTn = S wizi. 
i=1 
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Definition 1.68 (Geometric mean). The geometric mean is the function 


m 1/n 
E T (Is) , 


Definition 1.69 (Harmonic mean). The harmonic mean is the function 


We discuss weighted versions of the above means, as well as many other 
means in Chapter P} 


1.4.3 Medians 


Definition 1.70 (Median). The median is the function 


(£k) + 2ee41)), ifn = 2k is even 
(k)> if n = 2k — 1 is odd, 


X nie 


Med(x) = { 


where xip) is the k-th largest (or smallest) component of x. 


Definition 1.71 (a-Median). Given a value a € [0,1], the a-median is the 


function 
n—1 times 


Med,(x) = Med(z1,..., En, G,...,@). 


1.4.4 Ordered weighted averaging 


Ordered weighted averaging functions (OWA) are also averaging aggregation 
functions, which associate weights not with a particular input, but rather with 
its value. They have been introduced by Yager and have become very 
popular in the fuzzy sets community. 

Let xx, be the vector obtained from x by arranging its components in 
non-increasing order £1) > £(2) 2 --- 2 Ln): 
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Definition 1.72 (OWA). Given a weighting vector w, the OWA function is 
OW Aw -5 WiL) S< W,X >. 


Note that calculation of the value of the OWA function can be done by 
using a sort () operation. If all weights are equal, OWA becomes the arith- 
metic mean. The vector of weights w = (1,0,...,0) yields the maximum and 
w = (0,...,0,1) yields the minimum function. 


1.4.5 Choquet and Sugeno integrals 


These are two classes of averaging aggregation functions defined with respect 
to a fuzzy measure. They are useful to model interactions between the vari- 
ables zi. 


Definition 1.73 (Fuzzy micasure): Let N = {1,2,...,n}. A discrete fuzzy 
measure is a set function v: 2N — [0,1] which is monotonie (i.e. v(S) < 
v(T) whenever S C T) and satisfies v(0) = 0,v( N) = 1. 


Definition 1.74 (Choquet integral). The discrete Choquet integral with 
respect to a fuzzy measure v is given by 


= Yl Wilz; > ea}) = oles 2 t+ })I, (1.9) 


where X > = (X(1),(2);+++2(n)) is a non-decreasing permutation of the input 
x, and L(n41) = CO by convention. 


By rearranging the terms of the sum, Eq. (L9) can also be written as 


=o [zo - ta-n] oA). (1.10) 


i=l 


where z(o) = 0 by convention, and H; = {(i),...,(n)} is the subset of indices 
of n — i + 1 largest components of x. 

The class of Choquet integrals includes weighted arithmetic means and 
OWA functions as special cases. The Choquet integral is a piecewise linear 
idempotent function, uniquely defined by its values at the vertices of the unit 
cube [0,1]", i.e., at the points x, whose coordinates x; € {0,1}. Note that 
there are 2” such points, the same as the number of values that determine the 
fuzzy measure v. We consider these functions in detail in Chapter B] 


19 A set function is a function whose domain consists of all possible subsets of M. 
For example, for n = 3, a set function is specified by 2? = 8 values at v(0), v({1}), 


v({2}), 0(13}), v1, 2h), v1, 3), v2, 3}), o({1, 2, 3})- 
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Definition 1.75 (Sugeno integral). The Sugeno integral with respect to a 
fuzzy measure v is given by 
Sy(x) = „max ne p u(Hi)}, (1.11) 


where X > = (X(1),%(2);+++52(n)) is a non-decreasing permutation of the input 
x, and H; = {(i),...;(n)} 


In the special case of a symmetric fuzzy measure (i.e., when v(H;) = 
v(|H;|) depends only on the cardinality of the set H;), Sugeno integral becomes 
the median S(x) = Med(x1,..., £n, 1, v(n — 1), v(n — 2),...,v(1)). 


1.4.6 Conjunctive and disjunctive functions 


The prototypical examples of conjunctive and disjunctive aggregation func- 
tions are so-called triangular norms and conorms respectively (t-norms and 
t-conorms). They are treated in detail in Chapter [3] and below are just a 
few typical examples. All functions in Examples 1.79ļ|are symmetric and 
associative. 


Example 1.76. The product is a conjunctive extended aggregation function (it 


is a t-norm) 
n 


Tp(x) = [[=- 


i=1 
Example 1.77. The dual product, also called probabilistic sum, is a disjunctive 
extended aggregation function (it is a t-conorm) 


n 


Sp(x)=1- | [0 - z). 


i=1 


Example 1.78. Lukasiewicz triangular norm and conorm are conjunctive and 
disjunctive extended aggregation functions 


(x) = max(0, as ( (n — 1)) 


n 


S(x) = min(1, > %:). 


Example 1.79. Einstein sum is a disjunctive aggregation function (it is a t- 
conorm). Its bivariate form is given by 


zı + T2 
1+ gizo 


f (x1, £2) = 
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Example 1.80. The function 
f(21, 22) = x123 


is a conjunctive (2123 < zı£ə < min(zı, z2)), asymmetric aggregation func- 
tion. It is not a t-norm. 


1.4.7 Mixed aggregation 


In some situations, high input values are required to reinforce each other 
whereas low values pull the output down. Thus the aggregation function has 
to be disjunctive for high values, conjunctive for low values, and perhaps aver- 
aging if some values are high and some are low. This is typically the case when 
high values are interpreted as “positive” information, and low values as “neg- 
ative” information. The classical expert systems MYCIN and PROSPECTOR 
(3a, [ss] use precisely this type of aggregation (on [-1,1] interval). 

A different behavior may also be needed: aggregation of both high and low 
values moves the output towards some intermediate value. Thus certain aggre- 
gation functions need to be conjunctive, disjunctive or averaging in different 
parts of their domain. 

Uninorms and nullnorms (see Chapter M) are typical examples of such 
aggregation functions, but there are many others. 


Example 1.81. The 3 — IT function is 


n 
I] zi 
{=l 


Fx) ==—a 
I] vi + J0- z:) 
i=1 i=1 
with the convention § = 0. It is conjunctive on [0,5 


J”, disjunctive on [3, 1]” 
and averaging elsewhere. It is associative, with the neutral element e = 3, 
and discontinuous on the boundaries of [0, 1]”. It is a uninorm. 


1.5 Composition and transformation 
of aggregation functions 


We have examined several prototypical examples of aggregation functions from 
different classes. Of course, this is a very limited number of functions, and they 
may not be sufficient to model a specific problem. The question arises as to 
how we can construct new aggregation functions from the existing ones. Which 
properties will be conserved, and which properties will be lost? 

We consider two simple techniques for constructing new aggregation func- 
tions. The first technique is based on the monotonic transformation of the 
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inputs and the second is based on iterative application of aggregation func- 
tions. 

Let us consider univariate strictly increasing bijections (hence continuous) 
P1 P23- -Pn and Y; yi, : [0,1] — [0,1]. 


Proposition 1.82. Let yi,...,¢n,w : [0,1] — [0,1] be strictly increasing 
bijections. For any aggregation function f, the function 


g(x) = YF (p1 (21), p2(22), - -- Pn(@n))) 
is an aggregation function. 


Note 1.83. The functions yi, Y may also be strictly decreasing (but all at the same 
time), we already saw in section [3.3] that if we choose each y; and w as a strong 
negation, then we obtain a dual aggregation function g. 

Of course, nothing can be said about the properties of g. However in some 
special cases we can establish which properties remain intact. 


Proposition 1.84. Let f be an aggregation function and let g be defined as 
in Proposition [.84 Then 


If f is continuous, so is g. 








If f is symmetric and p1 = p2 =... = Yn, then g is symmetric. 
If f is associative and Yt = p1 =... = Yn then g is associative. 
Next, take y)(t) = yo(t) =... = Yn(t) = t. That is, consider a composi- 


tion of functions g(x) = (wo f)(x) = W(f(x)). We examine how the choice of 
w can affect the behavior of the aggregation function f. 

It is clear that Y needs to be monotone non-decreasing. Depending on its 
properties, it can modify the type of the aggregation function. 


Proposition 1.85. Let f be an aggregation function and let w : [0,1] > 
[0,1] be a non-decreasing function satisfying (0) = 0 and wW(1) = 1. If 
wifd,...,1,#,1,...,1)) < t for allt € [0,1] and at any position, then 
g=Wof is a conjunctive aggregation function. 


Proof. The proof is simple: For any fixed position i, and any x we have g(x) < 


ne 3 g(x) =9(1,...,1,4;,1...,1) < xi. This holds for every i, therefore 
2j€(0,1), 54% 
g(x) < min(x). By applying Proposition we complete the proof. 


Proposition|L.85}will be mainly used when choosing f as an averaging ag- 
gregation function. Not every averaging aggregation function can be converted 
to a conjunctive function using Proposition [I.85} the value f(x) must be dis- 


=a <1 with 0 in the i-th position, then necessarily w(t) = 0 fort < a. If w 
is a bijection, then f must have a = 0 as absorbing element. The main point 
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of Proposition [L85]is that one can construct conjunctive aggregation func- 
tions from many types of averaging functions (discussed in Chapter P) by a 
simple transformation, and that its condition involves single variate functions 
w(fd,..-,1,¢#,1,...,1)), which is not difficult to verify. We will use this sim- 
ple result in Section [8.4.16] when constructing asymmetric conjunctive and 
disjunctive functions. 


Note 1.86. A similar construction also transforms averaging functions with the ab- 
sorbing element a = 1, if w is a bijection, to disjunctive functions (by using duality). 
However it does not work the other way around, i.e., to construct averaging func- 
tions from either conjunctive or disjunctive functions. This can be achieved by using 
the idempotization method, see u3, p.28. 


Example 1.87. Take the geometric mean f(x) = \/%1%2, which is an averaging 
function with the absorbing element a = 0. Take y(t) = t?. Composition 
(Y o f)(x) = 2122, yields the product function, which is conjunctive. 

Example 1.88. Take the harmonic mean f(x) = 2( + + +) = 2%, which 
also has the absorbing element a = 0. Take again y(t) = t?. Composition 


g(x) = (Wo f)(x) = ga, is a conjunctive aggregation function (we can 
2 
check that g(x1,1) = my < x1). Now take u(t) = 3+. A simple computa- 
T1ıT 
ae 


tion yields g(a#1,1) = zı and g(a1,22) = 
norm TË, see p which is conjunctive. 


, a Hamacher triangular 


Let us now consider an iterative application of aggregation functions. Con- 
sider three aggregation functions f,g : [0,1]" — [0,1] and A : [0,1]? — [0,1], 
i.e., h is a bivariate function. Then the combination 


is also an aggregation function. It is continuous if f,g and h are. Depending 
on the properties of these functions, the resulting aggregation function may 
also possess certain properties. 


Proposition 1.89. Let f and g be n-ary aggregation functions, h be a bi- 
variate aggregation function, and let H be defined as H(x) = h( f(x), g(x)). 
Then 


If f and g are symmetric then H is also symmetric. 

If f,g and h are averaging functions, then H is averaging. 

If f,g and h are associative, H is not necessarily associative. 

If any or all f,g and h have a neutral element, H does not necessarily 
have a neutral element. 

e If f,g and h are conjunctive (disjunctive), H is also conjunctive (disjunc- 
tive). 
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Previously we mentioned that in certain applications the use of [0, 1] scale 
is not very intuitive. One situation is when we aggregate pieces of “positive” 
and “negative” information, for instance evidence that confirms and discon- 
firms a hypothesis. It may be more natural to use a bipolar [—1,1] scale, in 
which negative values refer to negative evidence and positive values refer to 
positive evidence. In some early expert systems (MYCIN 38] and PROSPEC- 
TOR [s3}) the [—1, 1] scale was used. 

The question is whether the use of a different scale brings anything new 
to the mathematics of aggregation. The answer is negative, the aggregation 
functions on two different closed intervals are isomorphic, i.e., any aggregation 
function on the scale [a, 6] can be obtained by a simple linear transformation 
from an aggregation function on [0,1]. Thus, the choice of the scale is a ques- 
tion of interpretability, not of the type of aggregation. 

Transformation from one scale to another is straightforward, and it can be 
done in many different ways. The most common formulas are the following. 
Let fl*'] be an aggregation function on the interval [a,b], and let fl] be the 
corresponding aggregation function on [0, 1]. Then 





fll(a1,...,2n) = (b — a) fO! = Orr =) +a, (1.12) 





b-a’ b—a 
lat] ((b— er ee E 
FON (a... ep) = EO at Fae WO“ ain FA (43) 


or in vector form 


peA) = 6a) fl (ZZ) +a, 


[a,b] —a)(x+a))—a 
flO) = SS 


Thus for transformation to and from a bipolar scale we use 


1 ntl 
fit A(e,. an) = of4) (2... 38) E 1, (1.14) 
[-1,1] (9 = 10605, 2%, 1 1 
PONa... £n) = E (1.15) 


1.6 How to choose an aggregation function 


There are infinitely many aggregation functions. They are grouped in various 
families, such as means, triangular norms and conorms, Choquet and Sugeno 
integrals, and many others. The question is how to choose the most suitable 
aggregation function for a specific application. Is one aggregation function 
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enough, or should different aggregation functions be used in different parts of 
the application? 

There are two components to the answer. First of all, the selected ag- 
gregation function must be consistent with the semantics of the aggregation 
procedure. That is, if one models a conjunction, averaging or disjunctive ag- 
gregation functions are not suitable. Should the aggregation function be sym- 
metric, have a neutral or absorbing element, or be idempotent? Is the number 
of inputs always the same? What is the interpretation of input values? An- 
swering these questions should result in a number of mathematical properties, 
based on which a suitable class or family can be chosen. 

The second issue is to choose the appropriate member of that class or 
family, which does what it is supposed to do — produces adequate outputs 
for given inputs. It is expected that the developer of a system has some rough 
idea of what the appropriate outputs are for some prototype inputs. Thus we 
arrive at the issue of fitting the data. 

The data may come from different sources and in different forms. First, it 
could be the result of some mental experiment: let us take the input values 
(1,0,0). What output do we expect? 

Second, the developer of an application could ask the domain experts to 
provide their opinion on the desired outputs for selected inputs. This can be 
done by presenting the experts some prototypical cases (either the input vec- 
tors, or domain specific situations before they are translated into the inputs). 
If there is more than one expert, their outputs could be either averaged, or 
translated into the range of possible output values, or the experts could be 
brought together to find a consensus. 

Third, the data could be collected in an experiment, by asking a group of 
lay people or experts about their input and output values, but without asso- 
ciating these values with some aggregation rule. For example, an interesting 
experiment reported in (286, [287] consisted in asking a group of people about 
the membership values they would use for different objects in the fuzzy sets 
“metallic”, “container”, and then in the combined set “metallic container”. 
The goal was to determine a model for intersection of two sets. The subjects 
were asked the questions about membership values on three separate days, to 
discourage them from building some inner model for aggregation. 

Fourth, the data can be collected automatically by observing the responses 
of subjects to various stimuli. For example, by presenting a user of a computer 
system with some information and recording their actions or decisions. 

In the most typical case, the data comes in pairs (x, y), where x € [0, 1]” 
is the input vector and y € [0,1] is the desired output. There are several 
pairs, which will be denoted by a subscript k: (Xk, yx), k = 1,..., K. However 
there are variations of the data set: a) some components of vectors x, may 
be missing, b) vectors x, may have varying dimension by construction, and 
c) the outputs yz, could be specified as a range of values (i.e., the interval 


Wy Tkl). 


1.6 How to choose an aggregation function 33 


In fitting an aggregation function to the data, we will distinguish interpo- 
lation and approximation problems. In the case of interpolation, our aim is to 
fit the specified output values exactly. For instance, the pairs ((0,0,...,0),0) 
and ((1,1,...,1), 1) should always be interpolated. On the other hand, when 
the data comes from an experiment, it will normally contain some errors, and 
therefore it is pointless to interpolate the inaccurate values yx. In this case our 
aim is to stay close to the desired outputs without actually matching them. 
This is the approximation problem. 

There are of course other issues to take into account when choosing an 
aggregation function, such as simplicity, numerical efficiency, easiness of in- 
terpretation, and so on [28d]. There are no general rules here, and it is up to 
the system developer to make an educated choice. In what follows, we concen- 
trate on the first two criteria: to be consistent with semantically important 
properties of the aggregation procedure, and to fit the desired data. 

We now formalize the selection problem. 


Problem 1.90 (Selection of an aggregation function). Let us have a 
number of mathematical properties P1, P2, . .. and the data D = { (xk, yx) } f1. 
Choose an aggregation function f consistent with P1, P2,..., and satisfying 
f (xx) Sürk = lenak: 


The approximate equalities may of course be satisfied exactly, if the prop- 
erties P1, P2,... allow this. We shall also consider a variation of the se- 
lection problem when yz are given as intervals, in which case we require 
f(x) € [Y> Up], or even approximately satisfy this condition. 

The satisfaction of approximate equalities f(x) ~ yx is usually translated 
into the following minimization problem. 


minimize ||r|| (1.16) 
subject to f satisfies P1,P2,..., 


where ||r|| is the norm of the residuals, i.e., r € R* is the vector of the 
differences between the predicted and observed values r = f (xp) — yp. There 
are many ways to choose the norm, and the most popular are the least squares 


norm 
n 1/2 
Hes ($) 


k=1 


the least absolute deviation norm 


K 
[irl = So Irel, 
k=1 


the Chebyshev norm 
\lr||o0 = max |rzl, 
k=1,...,K 


joas 
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or their weighted analogues, like 


K 1/2 
lell = (Zur) | 
k=1 


where the weight up > 0 determines the relative importance to fit the k-th 
value yk [29], 


Example 1.91. Consider choosing the weights of a weighted arithmetic mean 
consistent with the data set {(xx, yx) }#_, using the least squares approach. 
We minimize the sum of squares 


K n 2 
minimize X` | So Wiik — Yk 


k=1 \i=l 
n 
subject to i =T; 
i=1 
w1, ; Wn = 0 


This is a quadratic programming problem (see Section [A.5), which is solved 
by a number of standard methods. 


In some studies (138) it was suggested that for decision making problems, 
the actual numerical value of the output f(x,) was not as important as the 
ranking of the outputs. For instance, if yk < yı, then it should be f(x,) < 
f (xz). Indeed, people are not really good at assigning consistent numerical 
scores to their preferences, but they are good at ranking the alternatives. 
Thus it is argued that a suitable choice of aggregation function should 
be consistent with the ranking of the outputs yx rather than their numerical 
values. The use of the mentioned fitting criteria does not preserve the ranking 
of outputs, unless they are interpolated. Preservation of ranking of outputs 
can be done by imposing the constraints f(x.) < f(x) if yk < yı for all pairs 
k,l. We will consider this in detail in Chapter 5] 


1.7 Numerical approximation and optimization tools 


The choice of aggregation functions based on the empirical data requires solv- 
ing a regression (or approximation) problem, subject to certain constraints, 
see Problem (L.16). In this section we briefly outline a number of useful numer- 
ical tools that will allow us to solve such problems. A more detailed discussion 
is provided in the Appendix [A] p805] 

An approximation problem involves fitting a function from a certain class 
to the data (xz, yz),k = 1,..., K. If the data can be fitted exactly, it is 


20 Values yg may have been recorded with different accuracies, or specified by experts 
of different standing. 
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called an interpolation problem. The goal of approximation/interpolation is 
to build a model of the function f, which allows one to calculate its values at x 
different from the data. There are many different methods of approximation, 
starting from linear regression, polynomial regression, spline functions, radial 
basis functions, neural networks and so on. 

The best studied case is the univariate approximation/interpolation. The 
methods of Newton and Lagrange interpolation, polynomial and spline ap- 
proximation are the classical tools. However it is often needed to impose ad- 
ditional restrictions on the approximation f, the most important for us will 
be monotonicity. The methods of constrained approximation discussed in the 
Appendix [A] will be often used for construction of some classes of aggregation 
functions. 

Of course, aggregation functions are multivariate functions, hence the 
methods of univariate approximation are generally not applicable. The data do 
not have a special structure, so we will need methods of multivariate scattered 
data approximation. Further, aggregation functions have additional proper- 
ties (at the very least monotonicity, but other properties like idempotency, 
symmetry and neutral element may be needed). Hence we need methods of 
constrained multivariate approximation. Specifically we will employ methods 
of tensor-product spline approximation and Lipschitz approximation, outlined 
in the Appendix [A] 

An approximation problem typically involves a solution to an optimization 
problem of type (16). Depending on the properties of the chosen norm ||r|| 
and on the properties P;, it may turn out to be a general nonlinear optimiza- 
tion problem, or a problem from a special class. The latter is very useful, as 
certain optimization problems have well researched solution techniques and 
proven algorithms. It is important to realize this basic fact, and to formulate 
the approximation problem in such a way that its special structure can be 
exploited, rather than attempting to solve it with raw computational power. 

It is also important to realize that a general non-linear optimization prob- 
lem typically has many local optima, which are not the actual solutions (i.e., 
not the absolute minima and maxima). This is illustrated on Fig. [A-4Jon p. 
[323] The majority of methods of non-linear minimization are local methods, 
i.e., they converge to a local optimum of the objective function, not to its 
absolute (or global) minimum P] 

The number of local optima could be very high, of order of 101° — 10°, 
it grows exponentially with the number of variables. Therefore their explicit 
enumeration is practically infeasible. There are several global optimization 
methods for this type of problem (examples are random start, simulated an- 
nealing, tabu search, genetic algorithms, deterministic optimization), which 
we discuss in the Appendix [A] 


21 Frequently the term globally convergent is used to characterize such local meth- 
ods. It means that the method converges to a local minimum from any initial 
approximation, not that it converges to the global minimum. 
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On the other hand, some structured optimization problems involve a con- 
vex objective function (or a variant of it). In such cases there is a unique local 
minimum, which is therefore the global minimum. Local optimization methods 
will easily find it. The difficulty in this kind of problem typically lies in the 
constraints. When the constraints involve linear equalities and inequalities, 
and the objective function is either linear or convex quadratic, the problem is 
called either a linear or quadratic programming problem (LP or QP). These 
two special problems are extensively studied, and a number of very efficient 
algorithms for their solution are available (see Appendix [A). 

We want to stress the need to apply the right tool for solving each type of 
approximation or optimization problem. Generic off-the-shelf methods would 
be extremely inefficient if one fails to recognize and properly deal with a special 
structure of the problem. Even if the optimization problem is linear or convex 
in just a few variables, it is extremely helpful to identify those variables and 
apply efficient specialized methods for solving the respective sub-problems. 

In this book we will extensively rely on solving the following problems. 


Constrained least squares regression (includes non-negative least squares); 
Constrained least absolute deviation regression; 

Spline interpolation and approximation; 

Multivariate monotone interpolation and approximation; 

Linear programming; 

Quadratic programming; 

Unconstrained non-linear programming; 

Convex optimization; 

Univariate and multivariate global optimization. 


If the reader is not familiar with the techniques for solving the mentioned 
problems, a brief description of each problem and the available tools for their 
solution is given in Appendix [A] It also contains references to the current 
implementations of the mentioned algorithms. 
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Averaging Functions 


2.1 Semantics 


Averaging is the most common way to combine inputs. It is commonly used 
in voting, multicriteria and group decision making, constructing various per- 
formance scores, statistical analysis, etc. The basic rule is that the total score 
cannot be above or below any of the inputs. The aggregated value is seen as 
some sort of representative value of all the inputs. 

We shall adopt the following generic definition jo, u3) [s3], [37]. 


Definition 2.1 (Averaging aggregation). An aggregation function f is 
averaging if for every x it is bounded by 


min(x) < f(x) < max(x). 


We remind that due to monotonicity of aggregation functions, averaging 
functions are idempotent, and vice versa, see Note [I.12] p. [9] That is, an 
aggregation function f is averaging if and only if f(t,...,¢) = t for any t € 
(0, 1]. 

Formally, the minimum and maximum functions can be considered as aver- 
aging, however they are the limiting cases, right on the border with conjunctive 
and disjunctive functions, and will be treated in Chapter B] There are also 
some types of mixed aggregation functions, such as uninorms or nullnorms, 
that include averaging functions as particular cases; these will be treated in 
Chapter [4] 


Measure of orness 


The measure of orness, also called the degree of orness or attitudinal character, 
is an important numerical characteristic of averaging aggregation functions. It 
was first defined in 1974 by Dujmovic loo}, and then rediscovered several 
times, see [o7 263], mainly in the context of OWA functions (Section 2.5). It 
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is applicable to any averaging function (and even to some other aggregation 
functions, like ST-OWA , see Chapter [4). 

Basically, the measure of orness measures how far a given averaging func- 
tion is from the max function, which is the weakest disjunctive function. The 
measure of orness is computed for any averaging function [oo] [921 [97 using 


Definition 2.2 (Measure of orness). Let f be an averaging aggregation 
function. Then its measure of orness is 


_ Joa f(x)dx — Soa min(x)dx 


= a ea aa 2.1 
Joye max(x)dx — Soa min(x)dx (21) 


orness(f) 


Clearly, orness(max) = 1 and orness(min) = 0, and for any f, orness(f) € 
(0, 1]. The calculation of the integrals of max and min functions was performed 
in [89] and results in simple equations 


n 1 
max(x)dx = and min(x)dx = i 2.2 
fa ( ) n+1 [0,1] ( ) n+1 ( ) 








A different measure of orness, the average orness value, is proposed in log}. 


Definition 2.3 (Average orness value). Let f be an averaging aggregation 
function. Then its average orness value is 


ormess(f) = Í TO (2.3) 


1j» max(x) — min(x) 


Both the measure of orness and the average orness value are 4 for weighted 
arithmetic means, and later we will see that both quantities coincide for OWA 
functions. However computation of the average orness value for other averag- 
ing functions is more involved (typically performed by numerical methods) 
therefore we will use mainly the measure of orness in Definition 2.2] 


? 


2.2 Classical means 


Means are often treated synonymously with averaging functions. However, the 
classical treatment of means (see, e.g., 40 ) excludes certain types of averag- 
ing functions, which have been developed quite recently, in particular ordered 
weighted averaging and various integrals. On the other hand some classical 
means (e.g., some Gini means) lack monotonicity, and therefore are not aggre- 
gation functions. Following the tradition, in this section we will concentrate 
on various classical means, and present other types of averaging, or mean-type 
functions in separate sections. 
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Arithmetic mean 
The arithmetic mean is the most widely used aggregation function. 


Definition 2.4 (Arithmetic mean). The arithmetic mean, or the average 
of n values, is the function 


1 1 n 
M(x) = z(a +zr2 +... + 2n) = — r 
i=1 


Since M is properly defined for any number of arguments, it is an extended 
aggregation function, see Definition 


Main properties 


The arithmetic mean M is a strictly increasing aggregation function; 

M is a symmetric function; 

M is an additive function, i.e., M(x +y) = M(x) + M(y) for all x,y € 

[0, 1]” such that x + y € [0, 1]”; 
e M is a homogeneous function, i.e., M (Ax) = AM (x) for all x € [0,1]” and 

for all A € [0,1]; 

The orness measure orness(M) = 4; 
M is a Lipschitz continuous function, with the Lipschitz constant in any 
I|- ||» norm (see p.22) n~!/?, the smallest Lipschitz constant of all aggre- 
gation functions. 


When the inputs are not symmetric, it is a common practice to associate 
each input with a weight, a number w; € [0,1] which reflects the relative 
contribution of this input to the total score. For example, in shareholders’ 
meetings, the strength of each vote is associated with the number of shares 
this shareholder possesses. The votes are usually just added to each other, and 
after dividing by the total number of shares, we obtain a weighted arithmetic 
mean. Weights can also represent the reliability of an input or its importance. 

Weights are not the only way to obtain asymmetric functions, we will study 
other methods in Section 2-4Jand in ChaptersB]and[4] Recall from Chapter [I] 
the definition of a weighting vector: 


Definition 2.5 (Weighting vector). A vector w = (wi,...,Wn) is called a 
m 


weighting vector if wi € [0,1] and X w; = 1. 
i=l 


Definition 2.6 (Weighted arithmetic mean). Given a weighting vector 
w, the weighted arithmetic mean is the function 


n 
Mw(X) = wizı + wotg +... + WnTn = X wizi =<w,x>. 
i=1 
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Main properties 


e A weighted arithmetic mean Mw is a strictly increasing aggregation func- 
tion, if all w; > 0; 

e Mw is an asymmetric (unless w; = 1/n for all i € {1,...,n}) idempotent 
function; 

e My is an additive function, i.e., Mw(x + y) = My(x) + My(y) for all 
x,y € [0,1]” such that x + y € [0, 1]”; 

e Mw is a homogeneous function, i.e., My(Ax) = AMw(x) for all x € [0,1]” 
and for all A € [0,1]; 

e Jensen’s inequality: for any convex function] g : [0,1] — [-o0, oo, 
g(Mw(x)) < Mw(g(21),---,9(&n))- 

e M,, is a Lipschitz continuous function, in fact it is a kernel aggregation 
function (see p. 23); 
My is a shift-invariant function (see p. [I9 ; 
The orness measure orness( Mw) = 3; 
My is a special case of the Choquet integral (see Section[2.6) with respect 
to an additive fuzzy measure. 


Geometric and harmonic means 


Weighted arithmetic means are good for averaging inputs that can be added 
together. Frequently the inputs are not added but multiplied. For example, 
when averaging the rates of investment return over several years the use of 
the arithmetic mean is incorrect. This is because the rate of return (say 10%) 
signifies that in one year the investment was multiplied by a factor 1.1. If 
the return is 20% in the next year, then the total is multiplied by 1.2, which 
means that the original investment is multiplied by a factor of 1.1 x 1.2. The 
average return is calculated by using the geometric mean of 1.1 and 1.2, which 
gives © 1.15. 


1 A function g is convex if and only if g(ati + (1 — a)tz2) < ag(t1) + (1 — a)g(t2) 
for all t1,t2 € Dom(g) and a € [0,1]. 
2 It is easy to check that 


1 1 1 
M(x)dx = + f rider +...+ f nden) == f tdt ==. 
[0,1]” n Jo 0 n Jo 2 


Substituting the above value in (ZT) we obtain orness(M) = 4. Following, for a 
weighted arithmetic mean we also obtain 


1 1 n 1 
| Mer (ojdx = wi | rider +.. tun | andea = n f aes. 
(0,1]” 0 0 i=l 0 : 
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Fig. 2.1. 3D plots of weighted arithmetic means M(1,3) and M(2,2): 


Definition 2.7 (Geometric mean). The geometric mean is the function 


m 1/n 
G(x) = ya = (1s) | 


Definition 2.8 (Weighted geometric mean). Given a weighting vector w, 
the weighted geometric mean is the function 


Definition 2.10 (Weighted harmonic mean). Given a weighting vector 
w, the weighted harmonic mean is the function 


Hy, (x) = (>: “) 
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Note 2.11. If the weighting vector w is given without normalization, i.e., W = 
SOL, wi # 1, then one can either normalize it first by dividing each component by 
W, or use the alternative expressions for weighted geometric and harmonic means 


ee 1/W 





Fig. 2.2. 3D plots of weighted geometric means G1 1) and Gaa) 
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Fig. 2.3. 3D plots of weighted harmonic means H (4,4) and He 
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Geometric-Arithmetic Mean Inequality 


The following result is an extended version of the well known geometric- 
arithmetic means inequality 


Hy(x) < Gw(x) < Mw(x), (2.4) 


for any vector x and weighting vector w, with equality if and only if x = 
(BH x34. 

Another curious relation between these three means is that for n = 2 we 
have G(z,y) = y M(x,y): H(a,y). 


Power means 


A further generalization of the arithmetic mean is a family called power means 
(also called root-power means). This family is defined by 


Definition 2.12 (Power mean). For r € R, the power mean is the function 


m l/r 
1 7 
Minx) = (52) ; 
i=1 


ifr #0, and Mig(x) = G(x) f 


Definition 2.13 (Weighted power mean). Given a weighting vector wand 
r € R, the weighted power mean is the function 


f 1/r 
My tr] (x) = p wat) ; 
i=l 


ifr #0, and My, jo\(x) = Gw(x). 


Note 2.14. The family of weighted power means is augmented to r = —oo and r = œo 
by using the limiting cases 


Mw,{-~] (x) = lim Mw, (x) = min(x), 


Mw, [oo] (x) = Jim | My, {r] (x) = max(x). 


However min and max are not themselves power means. 
The limiting case of the weighted geometric mean is also obtained as 


3 We shall use square brackets in the notation M [r] for power means to distinguish 
them from quasi-arithmetic means M, (see Section 2.3), where parameter g de- 
notes a generating function rather than a real number. The same applies to the 
weighted power means. 
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Mw.) (x) = lim Mw, (rj) = Gw(x). 


Of course, the family of weighted power means includes the special cases 
My (x) = Mw(x), and Mw -1 (x) = Hw(x). Another special case is the 
weighted quadratic mean 


Mw (x) = Qw(x) = 





Main properties 


e The weighted power mean M,, fr] is a strictly increasing aggregation func- 
tion, if all w; > 0 and 0 < r < o0; 
Mw, [r] is a continuous function on [0, 1]”; 
Mw, [r] is an asymmetric idempotent function (symmetric if all w; = +): 

e My,jrj is a homogeneous function, i.e., My fr (àx) = AMw,fr](x) for all 
x € [0,1]” and for all A € [0,1]; it is the only homogeneous weighted 
quasi-arithmetic mean (this class is introduced in Section 22.3); 

e Weighted power means are comparable: My, fr](x) < My jsj(x) ifr < s; 
this implies the geometric-arithmetic mean inequality; 

e Mw jr] has absorbing element (always a = 0) if and only if r < 0 (and all 
weights w; are positive); 

e M,,,[r) does not have a neutral element. 





Fig. 2.4. 3D plots of weighted quadratic mean Q(4,4) and Q(44)- 


4 The limiting cases min (r = —0o) and max (r = 00) which have neutral elements 
e = 1 and e = 0 respectively, are not themselves power means. 
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Fig. 2.5. 3D plots of power means Ms) and Mig. 


Measure of orness 


Calculations for the geometric mean yields 


1 1 ý 
orness(G) = — ee ( = ) ; 


n-1 n—-1\n+l1 














but for other means explicit formulas are known only for special cases, e.g., 
n=2 


1 1 
Olxjax = 5 (1+ Sina + v3) = 0.541075, 
[Qe (A+ Jpn tv 


| H(x)dx = et —In(2)), and 
[0,1]2 3 


2 
Mi- (x)dx = 5(2— v2), 
[0,1]? 3 
from which, when n = 2, orness(Q) ~ 0.623225, orness(H) ~ 0.22741, and 
orness(Mj_9}) = 3 — 2v2. 


Definition 2.15 (Dual weighted power mean). Let Mw,jr] be a weighted 
power mean. The function 


Mw, [r] (x) =1- My, jr — x) 
is called the dual weighted power mean. 


Note 2.16. The dual weighted power mean is obviously a mean (the class of means 
is closed under duality). The absorbent element, if any, becomes a = 1. The exten- 
sions of weighted power means satisfy Mw,{œ](x) = Mw, j—coj(x) and My .—ooj(X) = 
Mw [oo] (x). The weighted arithmetic mean Mw is self-dual. 
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2.3 Weighted quasi-arithmetic means 


2.3.1 Definitions 


Quasi-arithmetic means generalize power means. Consider a univariate con- 
tinuous strictly monotone function g : [0,1] — [—0oo, o0], which we call a gen- 


erating function. Of course, g is invertible, but it is not necessarily a bijection 
(i.e., its range may be Ran(g) C [—o~, ox]). 


Definition 2.17 (Quasi-arithmetic mean). For a given generating func- 
tion g, the quasi-arithmetic mean is the function 


M,(x) =g (2 ae) l (2.5) 


Its weighted analogue is given by 


Definition 2.18 (Weighted quasi-arithmetic mean). For a given gen- 
erating function g, and a weighting vector w, the weighted quasi-arithmetic 
mean is the function 


Mw,g(x) = 97" (>: vate) . (2.6) 


The weighted power means are a subclass of weighted quasi-arithmetic 
means with the generating function 


t; ifr £0, 
aft) = EM ifr =0. 
Note 2.19. Observe that if Ran(g) = [—0o, ov], then we have the summation —o0 +00 


or +00 — oo if x; = 0 and x; = 1 for some i Æ j. When this occurs, a convention, 
such as —oo + 00 = +00 — œ = —on, is adopted, and continuity of Mw,g is lost. 


Note 2.20. If the weighting vector w is not normalized, i.e., W = X} ;_; wi #1, then 
weighted quasi-arithmetic means are expressed as 


Mw,g(X) = go (+ O) : 


2.3.2 Main properties 


e Weighted quasi-arithmetic means are continuous if and only if Ran(g) 4 
[-00, o0] i 


’ 
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Weighted quasi-arithmetic means with strictly positive weights are strictly 

monotone increasing on ]0, 1["; 

The class of weighted quasi-arithmetic means is closed under duality. That 

is, given a strong negation N, the N-dual of a weighted quasi-arithmetic 

mean My, is in turn a weighted quasi-arithmetic mean, given by My, gon. 

For the standard negation, the dual of a weighted quasi-arithmetic mean 

is characterized by the generating function h(t) = g(1 — t); 

The following result regarding self-duality holds (see, e.g., [209}): Given a 

strong negation N, a weighted quasi-arithmetic mean My,, is N-self-dual 

if and only if N is the strong negation generated by g, i.e., if N(t) = 

g+(g(0) + g(1) — g(t)) for any t € [0,1]. This implies, in particular: 

— Weighted quasi-arithmetic means, such that g(0) = +00 or g(1) = +00 
are never N-self-dual (in fact, they are dual to each other); 

— Weighted arithmetic means are always self-dual (i.e., N-self-dual with 
respect to the standard negation N(t) = 1 — t); 

The generating function is not defined uniquely, but up to an arbitrary 

linear transformation, i.e., if g(t) is a generating function of some weighted 

quasi-arithmetic mean, then ag(t) +b, a,b € R, a Æ 0 is also a generating 

function of the same mear], provided Ran(g) 4 [—~, oo]; 

There are incomparable quasi-arithmetic means. Two quasi-arithmetic 

means M, and Mnp satisfy M, < Mn if and only if either the compos- 

ite go h~! is convex and g is decreasing, or g o ht is concave and g 

increasing; 

The only homogeneous weighted quasi-arithmetic means are weighted 

power means; 

Weighted quasi-arithmetic means do not have a neutral element, They 

may have an absorbing element only when all the weights are strictly pos- 

itive and g(0) = +00 or g(1) = +00, and in such cases the corresponding 

absorbing elements are, respectively, a = 0 and a = 1. 














2.3.3 Examples 


Example 2.21 (Weighted power means). Weighted power means are a special 


case of weighted quasi-arithmetic means, with g(t) = t", r # 0 and g(t) = 
log(t) if r = 0. Note that the generating function g(t) = == defines exactly 


the same power mean (as a particular case of a linear transformation of g). Also 
note the similarity of the latter to the additive generators of the Schweizer- 


Sklar family of triangular norms, p. [150] By taking g(t) = (1 — t)", r #0 and 


g(t) = log(1 — t) if r = 0, we obtain the family of dual weighted power means, 
which are related to Yager triangular norms, p. [156] 


5 For this reason, one can assume that g is monotone increasing, as otherwise we 
can simply take —g. 
6 Observe that the limiting cases min and max are not quasi-arithmetic means. 
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Example 2.22 (Harmonic and geometric means). These classical means, de- 
fined on page [43| are special cases of the power means, obtained when 
g(t) = t", r = —1 and r = 0 respectively. 


Example 2.23. Let g(t) = log +4. The corresponding quasi—arithmetic mean 
M, is given by 





i 
—— 5, IIE {z1,... £n} 


My “pisie zi) 


that is, M, = Cent and with the convention 2 = 0. 





otherwise, 


Example 2.24 (Weighted trigonometric means). Let gi(t) = sin(5t), go(t) = 
cos($t), and g3(t) = tan($t) be the generating functions. The weighted 
trigonometric means are the functions 


2 n 
SMy(x) = = arcsin( > Wi sin(72)), 
i=1 


2 n 
CM,y,(x) = = arccos( ò Wi cos(52:)) and 


i=1 


2 n 
T Mw(x) = — arctan( J Wi tan(52;)). 
T 
i=1 


Their 3D plots are presented on Figures [2.6] and 2.7] 


Example 2.25 (Weighted exponential means). Let the generating function be 


; 
_JYtyAl, 
a= {7 ify=l. 


The weighted exponential mean is the function 


j = log, (2 ai wiy”) ? if y + 1, 
EMw (x) m ka if y= 1. 


3D plots of some weighted exponential means are presented on Figures 


and 22.9] 


Example 2.26. There is another mean also known as exponential {ao}, given 


for x > 1 by 
) = exp (d (J [log(a:) ) ; 
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Fig. 2.7. 3D plots of weighted trigonometric means TM, 12) and TM, 


4 
5 





1 
5? 


It is a quasi-arithmetic mean with a generating function g(t) = log(log(t)), 
and its inverse g~!(t) = exp(exp(t)). 
In the domain [0, 1]” one can use a generating function g(t) = log(— log(t)), 


so that its inverse is g~1(t) = exp(— exp(t)). This mean is discontinuous, since 
Ran(g) = [—00, co]. We obtain the expression 


n 


exp | — [C log(z;))!/” }. 


1 


f(x 





u 
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Fig. 2.9. 3D plots of exponential means EM(2,2).0.5 and EM(1 1) 100- 


23 


Example 2.27 (Weighted radical means). Let y > 0, y 4 1, and let the gen- 
erating function be 


g(t) =". 
The weighted radical mean is the function 
-1 


RMy (x) = log, (> wig) 


i=l 


3D plots of some radical means are presented on Figure 2.10] 


Example 2.28 (Weighted basis-exponential means). Weighted basis-exponential 
means are obtained by using the generating function g(t) = t* and t > + (this 
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Fig. 2.10. 3D plots of radical means RM(4,4),0.5 and RM4 ,4),100: 


generating function is decreasing on [0, +[ and increasing on ]ż, oo], hence the 
restriction). The value of this mean is such a value y that 


n 
yy = > uht 
i=1 


For practical purposes this equation has to be solved for y numerically. 


Example 2.29 (Weighted basis-radical means). Weighted basis-radical means 
are obtained by using the generator g(t) = t/t and t > + (restriction for the 
same reason as in the Example 2.28). The value of this mean is such a value 
y that 


n 
1 1/2; 
y” = > wt *. 
i=1 


For practical purposes this equation has to be solved for y numerically. 


2.3.4 Calculation 


Generating functions offer a nice way of calculating the values of weighted 
quasi-arithmetic means. Note that we can write 


Mw, (x) = g™™ (Mw(g(x))), 


where g(x) = (g(x1),---,g9(@n)). Thus calculation can be performed in three 
steps: 


1. Transform all the inputs by calculating vector g(x); 
2. Calculate the weighted arithmetic mean of the transformed inputs; 
3. Calculate the inverse g7! of the computed mean. 
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However one needs to be careful with the limiting cases, for example when 
g(x;) becomes infinite. Typically this is an indication of existence of an absorb- 
ing element, this needs to be picked up by the computer subroutine. Similarly, 
special cases like Mj,)(x), r > +00 have to be accommodated (in these cases 
the subroutine has to return the minimum or the maximum). 





2.3.5 Weighting triangles 


When we are interested in using weighted quasi-arithmetic means as extended 
aggregation functions, we need to have a clear rule as to how the weighting 
vectors are calculated for each dimension n = 2,3,.... For symmetric quasi- 
arithmetic means we have a simple rule: for each n the weighting vector w” = 
(4,...,+). For weighted means we need the concept of a weighting triangle. 
Definition 2.30 (Weighting triangle). A weighting triangle or triangle of 
weights is a set of numbers w? € [0,1], fori =1,...,n and n> 1, such that: 


n 
X w? =1, for alln > 1. It will be represented in the following form 
i=1 
1 
wi w 
wi w3 w3 


wi w3 w3 wy 


Weighting triangles will be denoted by Aw?’. 


Example 2.31. A basic example is the “normalized” Pascal triangle 


1 
1/2 1/2 
1/4 2/4 1/4 
1/8 3/8 3/8 1/8 
1/16 4/16 6/16 4/16 1/16 


The generic formula for the weighting vector of dimension n in this weighting 


triangle is 
7 1 n—1 n—1 n—1 
w” =— : SER 
grat 0 1 n—1 


for each n > if 


It is possible to generate weighting triangles in different ways fad}: 


T Recall ( ) = nl 


“i ml(n—my! i 
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Proposition 2.32. The following methods generate weighting triangles: 


. > 0 be a sequence of non-negative real numbers such that 


1. Let M1, A2, T 
Ai > 0. Define the weights using 
Ani 
w? = ee 
Artet An 


for alli =1,...,n andn > 1; 
2. Let N be a strong negation. |] Generate the weights using N by 





for alli=1,...,n andn > l; 
3. Let Q be a monotone non-decreasing function Q : [0,1] — [0,1] such thabl 
Q(0) = 0 and Q(1) = 1. Generate the weights using function Q by leod] 





foralli= 1; p m andn > 1, 

Another way to construct weighting triangles is by using fractal structures 
exemplified below. Such weighting triangles cannot be generated by any of 
the methods in Proposition [2.32 
Example 2.33. The following two triangles belong to the Sierpinski family 


1 
14 3.4 
lf 3.4 Bp 
Li 34 See 3g 
and 
1 
1-3 8-5 


8 See Definition [48] on p. [8] 


° This is a so-called quantifier, see p. 
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b-a 4 ia 
zitis 





In general, given two real numbers a,b such that a < b and ¢ # 
possible to define a weighting triangle by 








This weighting triangle also belongs to the Sierpinski family, and a generic 
formula for the weights is 


n a [b-a =l . n b-a\"" 


Let us now mention two characterization theorems, which relate continuity, 
strict monotonicity and the properties of decomposability and bisymmetry to 
the class of weighted quasi-arithmetic means. 








Theorem 2.34 (Kolmogorov-Nagumo). An extended aggregation func- 
tion F is continuous, decomposable 4, and strictly monotone if and only 
if there is a monotone bijection g : [0,1] — [0,1], such that for each n > 1, fn 
is a quasi-arithmetic mean Mg. 


The next result is a generalized version of Kolmogorov and Nagumo char- 
acterization, due to Aczél 


Theorem 2.35. An extended aggregation function F is continuous, bisym- 
metric E, idempotent, and strictly monotone if and only if there is a mono- 
tone bijection g : [0,1] — [0,1], and a weighting triangle Aw? with all positive 
weights, so that for each n > 1, fn is a weighted quasi-arithmetic mean Mwn g 
(i.e., fn = Mwn g). 


Note 2.86. If we omit the strict monotonicity of F, we recover the class of non-strict 
means introduced by Fodor and Marichal [o6). 


2.3.6 Weights dispersion 


An important quantity associated with weighting vectors is their dispersion, 
also called entropy. 


10 See Definition [L42] Continuity and decomposability imply idempotency. 
11 See Definition [L43 
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Definition 2.37 (Weights dispersion (entropy)). For a given weighting 
vector w its measure of dispersion (entropy) is 


Disp(w) = — 5 wi log wi, (2.7) 
i=1 


with the convention 0 - log 0 = 0. 


The weights dispersion measures the degree to which a weighted ag- 
gregation function f takes into account all inputs. For example, in the 
case of weighted means, among the two weighting vectors wı = (0,1) and 
w2 = (0.5,0.5) the second one may be preferable, since the corresponding 
weighted mean uses information from two sources rather than a single source, 
and is consequently less sensitive to input inaccuracies. 

A useful normalization of this measure is 





L Soul 
i 10 ie 
logn +" oe 


Along with the orness value (p. (40), the weights entropy is an important 
parameter in choosing weighting vectors of both quasi-arithmetic means and 
OWA functions (see p. [70). 

There are also other entropy measures (e.g., Rényi entropy) frequently 
used in studies of weighted aggregation functions, e.g., |266 


2.3.7 How to choose weights 
Choosing weights of weighted arithmetic means 


In each application the weighting vector of the weighted arithmetic mean will 
be different. We examine the problem of choosing the weighting vector which 
fits best some empirical data, the pairs (xz, yx), k = 1,..., K. Our goal is to 
determine the best weighted arithmetic mean that minimizes the norm of the 
differences between the predicted (f(x,)) and observed (yx) values. We will 
use the least squares or least absolute deviation criterion, as discussed on p. 
[33] In the first case we have the following optimization problem 


K n 2 
min > (È Wiik — w) (2.8) 
k=1 \i=l 


n 
sits X w; = Lw > 0i = rrn. 
i=1 


12 These measures of entropy can be obtained by relaxing the subadditivity condition 
which characterizes Shannon entropy ; 
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It is easy to recognize a standard quadratic programming problem (QP), 
with a convex objective function. There are plenty of standard methods for 
its solution, discussed in the Appendix 

We mentioned on p.[33]that one can use a different fitting criterion, such as 
the least absolute deviation (LAD) criterion, which translates into a different 
optimization problem 


min 


Us 


Il 
m 


Max 


Witik — Yk (2.9) 


x 
Il 


1i 








n 
Ste X w = dw S0 Sien 
i=1 
This problem is subsequently converted into a linear programming problem 
(LP) as discussed in the Appendix [A.2] 

Particular attention is needed for the case when the quadratic (resp. lin- 
ear) programming problems have singular matrices. Such cases appear when 
there are few data, or when the input values are linearly dependent. While 
modern quadratic and linear programming methods accommodate for such 
cases, the minimization problem will typically have multiple solutions. An ad- 
ditional criterion is then used to select one of these solutions, and typically 
this criterion relates to the dispersion of weights, or the entropy (235), as de- 
fined in Definition [2.37] Torra proposes to solve an auxiliary univariate 
optimization problem to maximize weights dispersion, subject to a given value 
of (2:8). 


Specifically, one solves the problem 
min X wu; log wi (2.10) 
i=1 


nm 
st. X w=1,w;>0,1=1,...,n, 
i=l 


K n 2 
2 ( WiZik — n) =A, 
k=1 \i=1 
where A is the value of the solution of problem (2.8). It turns out that if 
Problem (2.8) has multiple solutions, they are expressed in parametric form 
as linear combinations of one another. Further, the objective function in (2.10) 
is convex. Therefore problem (2.10) is a convex programming problem subject 
to linear constraints, and it can be solved by standard methods, see [235]. 

A different additional criterion is the so-called measure of orness (discussed 
in Section P.I), which measures how far a given averaging function is from the 
max function, which is the weakest disjunctive function. It is applicable to 
any averaging function, and is frequently used as an additional constraint or 
criterion when constructing these functions. However, for any weighted arith- 
metic mean, the measure of orness is always 4, therefore this parameter does 
not discriminate between arithmetic means with different weighting vectors. 
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Preservation of ordering of the outputs 


We recall from Section [L6] p. B4] that sometimes one not only has to fit an 
aggregation function to the numerical data, but also preserve the ordering of 
the outputs. That is, if yj < yx then we expect f(x;) < f(xx). 

First, arrange the data, so that the outputs are in non-decreasing order, 
i.e., Yk < Yk+1, k = 1,..., K —1. Define the additional linear constraints 


n 
< Xk+1 Xk, W >= > Wwi(Ti,k+1 — Tik) > 0, 
i=1 


k = 1,...,K — 1. We add the above constraints to problem (2.8) or (2.9) 
and solve it. The addition of an extra K — 1 constraints neither changes the 
structure of the optimization problem, nor drastically affects its complexity. 


Choosing weights of weighted quasi-arithmetic means 


Consider the case of weighted quasi-arithmetic means, when a given generating 
function g is given. As before, we have a data set (xz, yk), k = 1,..., K, and 
we are interested in finding the weighting vector w that fits the data best. 
When we use the least squares, as discussed on p. [B3] we have the following 
optimization problem 


jiii > (~ (= usa) ) 2 w) (2.11) 


n 
st. X w= lw > 0t = boisi: 
i=l 


This is a nonlinear optimization problem, but it can be reduced to quadratic 


programming by the following artifice. Let us apply g to yẹ and the inner sum 
in 2.11). We obtain 


K n 2 
min $ (È wae)  olan)) (2.12) 
k=1 \i=1 


n 
st. So wu =l, >0,4=1,...,n. 
i=1 


i= 


We recognize a standard quadratic programming problem (QP), with a 
convex objective function. This approach was discussed in detail in iE! Ro, 
[30, [235]. There are plenty of standard methods of solution, discussed in the 
Appendix [A5] 

If one uses the least absolute deviation (LAD) criterion (p. B3) we obtain 
a different optimization problem 
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K 


min L > wig(zik) — g(Yk) (2.13) 


n 
st So w= lhw > 0i eg: 
i=1 


This problem is subsequently converted into a linear programming problem 
(LP) as discussed in the Appendix [A.2] 

As in the case of weighted arithmetic means, in the presence of multiple 
optimal solutions, one can use an additional criterion of the dispersion of 
weights [235]. 


Preservation of ordering of the outputs 


Similarly to what we did for weighted arithmetic means (see also Section [6] 
p. B4), we will require that the ordering of the outputs is preserved, i.e., if 
Yj < yr then we expect f(x;) < f(xx). We arrange the data, so that the 
outputs are in non-decreasing order, Yk < Yk+1;k = 1,..., K —1. Then we 
define the additional linear constraints 


n 


< g(@e41) — G(XK), W >= 2 wi(g(i,k+1) — 9(rik)) = 0, 


k =1,...,K —1. We add the above constraints to problem (2.12) or (2.13) 
and solve it. The addition of extra K — 1 constraints does not change the 
structure of the optimization problem, nor drastically affects its complexity. 


Choosing generating functions 


Consider now the case when the generating function g is also unknown, and 

hence needs to be found based on the data. We study two cases: a) when g is 

given algebraically, with one or more unknown parameters to estimate (e.g., 

gp(t) = t”, p unknown) and b) when no specific algebraic form of g is given. 
In the first case we solve the problem 


n 


min $ (È uglen) —gplan)) (2.14) 


n 
st. Sow; =1,w; >0,1=1,...,0n, 
i=l 


conditions on p. 


While this general optimization problem is non-convex and nonlinear (i.e., 
difficult to solve), we can convert it to a bi-level optimization problem (see 
Appendix [A.5.3) 
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min [mn $ (È wine) — am) | (2.15) 


p wW k=1 \i=1 
n 

s.t YS wi=1,w;,>0,i=1,...,n, 
i=1 


plus conditions on p. 


The problem at the inner level is the same as (2.12) with a fixed gp, which 
is a QP problem. At the outer level we have a global optimization problem 
with respect to a single parameter p. It is solved by using one of the methods 
discussed in Appendix We recommend deterministic Pijavski- 
Shubert method. 


Example 2.38. Determine the weights and the generating function of a fam- 
ily of weighted power means. We have gp(t) = tP, and hence solve bi-level 
optimization problem 


pE[—20,00] = 


K n 2 
min min `, ( wit, — it) (2.16) 
W k=1 \i=l 
s.t. YS wi =1,u; a a eee oF 
i=1 


Of course, for numerical purposes we need to limit the range for p to a finite 
interval, and treat all the limiting cases p — +00, p — 0 and p > —1. 





A different situation arises when the parametric form of g is not given. The 
approach proposed in is based on approximation of g with a monotone 
linear spline (see Appendix [A.3] p. B09), as 


J 
a(t) = D> B0), (2.17) 


where Bj are appropriately chosen basis functions, and cj are spline coef- 
ficients. The monotonicity of g is ensured by imposing linear restrictions 
on spline coefficients, in particular non-negativity, as in {13}. Further, since 
the generating function is defined up to an arbitrary linear transformation, 
one has to fix a particular g by specifying two interpolation conditions, like 
g(a) = 0,9(b) = 1,a,b €]0,1[, and if necessary, properly model asymptotic 
behavior if g(0) or g(1) are infinite. 

After rearranging the terms of the sum, the problem of identification be- 
comes (subject to linear conditions on c, w) 


J n 
min) Soo |X wiB; (ain) Bilu) | - (2.18) 


2 
K 
k=1 \ j=l i=1 
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For a fixed c (i.e., fixed g) we have a quadratic programming problem to find 
w, and for a fixed w, we have a quadratic programming problem to find c. 
However if we consider both c,w as variables, we obtain a difficult global 
optimization problem. We convert it into a bi-level optimization problem 


2 


K J n 
minmin) Sieg XO wiBj (xix) — By (yx) ; (2.19) 


k=1 \j=1 i=1 


where at the inner level we have a QP problem and at the outer level we have 
a nonlinear problem with multiple local minima. When the number of spline 
coefficients J is not very large (< 10), this problem can be efficiently solved 
by using deterministic global optimization methods from Appendix [A.5.5) If 
the number of variables is small and J is large, then reversing the order of 
minimization (i.e., using minw mine) is more efficient. 


2.4 Other means 


Besides weighted quasi-arithmetic means, there exist very large families of 
other means, some of which we will mention in this section. A comprehensive 
reference to the topic of means is fao]. However we must note that not all 
these means are monotone functions, so they cannot be used as aggregation 
functions. Still some members of these families are aggregation functions, and 
we will mention the sufficient conditions for monotonicity, if available. Most 
of the mentioned means do not require x € [0, 1]”, but we will assume x > 0. 


2.4.1 Gini means 


Definition 2.39 (Gini mean). Let p,q E R and w € R”,w > 0. Weighted 
Gini mean is the function 





n 1/p—q 
> mies $ 
= — if DF 4, 
G(x) = 4 (a (2.20) 
i p 1/ £ wiz? 
(ha ÈT goza 
Properties 


GP:l = GLP, so we assume p > q; 
lim GP! = G24; 
pq 
e lim G?:4(x) = max(x); 
po 
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e lim G®:1(x) = min(x); 


q-—co 


© Epi < p2,q1 < Q2, then GO < CP. 
Special cases 


Setting q = 0 and p > 0 leads to weighted power means G?;° = Maz ipl 
Setting p = 0 and q < 0 also leads to weighted power means G9;7 = Mes jay: 
Setting q = p — 1 leads to counter-harmonic means, also called Lehmer 





q+1,q gitiagett 
means. For example, when n = 2, Ga 1) (a1, £2) = qER. 
222 1 2 
A . 21 2p 22 
e When q = 1 we obtain the contraharmonic mean G, CE z2) = SS. 
2°2 





Note 2.40. Counter-harmonic means (and hence Gini means in general) are not 
monotone, except special cases (e.g., power means). 





Fig. 2.11. 3D plots of weighted Gini means Ga 1) and Ga a, (both are weighted 
2°72 575 
contraharmonic means). Note lack of monotonicity. 


2.4.2 Bonferroni means 


Definition 2.41 (Bonferroni mean (36]). Let p,q > 0 and x > 0. Bonfer- 
roni mean is the function 


1/(p+q) 
1 
pra = |e P ya 3 2.21 
(x) nai, e285 (2.21) 


Extension to BP." (x), etc., is obvious. 


It is an aggregation function. 
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Fig. 2.12. 3D plots of weighted Gini means Ge 4) and Gas) (both are weighted 
55 55 
counter-harmonic means). 





Fig. 2.13. 3D plot of weighted Gini means G: ond Ges 
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2.4.3 Heronian mean 





Definition 2.42 (Heronian mean). Heronian mean is the function 


2 nm nm 
H R(x) = mee) 


It is an aggregation function. For n = 2 we have HR = 








(2.22) 





(2M +G). 


wl 
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Fig. 2.14. 3D plots of weighted Gini means Ger 1, and G 


2:2 





Fig. 2.15. 3D plots of Bonferroni means B®? and B™?., 


2.4.4 Generalized logarithmic means 


Definition 2.43 (Generalized logarithmic mean). Let n = 2, x,y > 0, 
x#y and p € [—o0, ow]. The generalized logarithmic mean is the function 


eee if p =al, 
U/(y-2) 
(ef , if p=0, 
L? (x,y) = min(z, y), if p = =o, (2.23) 
max(z, y), if p =O, 
p+1_,.p+1\1/p . 
Da otherwise. 


Forx=y, L*(a,x) =a. 
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Fig. 2.16. 3D plots of Bonferroni mean B®% and the Heronian mean. 


Note 2.44. Generalized logarithmic means are also called Stolarsky means, some- 
times L? is called L?*?. 


Note 2.45. The generalized logarithmic mean is symmetric. The limiting cases x = 0 
depend on p, although L?(0,0) = 0. 


Special cases 


The function L° (x,y) is called identric mean; 
L~*(a,y) = G(a,y), the geometric mean; 

L~' is called the logarithmic mean; 

L~‘/? is the power mean with p = —1/2; 

L! is the arithmetic mean; 

Only L~!/?, L~? and L! are quasi-arithmetic means. 


Note 2.46. For each value of p the generalized logarithmic mean is strictly increasing 
in x,y, hence they are aggregation functions. 


Note 2.47. Generalized logarithmic means can be extended for n arguments using 
the mean value theorem for divided differences. 


2.4.5 Mean of Bajraktarevic 
Definition 2.48 (Mean of Bajraktarevic). Let w(t) = (wi(t),...,Wn(t)) 


be a vector of weight functions w; : [0,1] — [0,00[, and let g : [0,1] — [—o0, oo] 
be a strictly monotone function. The mean of Bajraktarevic is the function 


F =g | =————= Is (2.24) 
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Fig. 2.18. 3D plots of generalized logarithmic means L°( (identric mean) and L~+ 
(logarithmic mean). 


The Bajraktarevic mean is also called a mixture function when 
g(t) = t. The function g is called the generating function of this mean. If 
w;(t) = w; are constants for alli = 1,...,n, it reduces to the quasi-arithmetic 
mean. The special case of Gini mean G?*? is obtained by taking w;(t) = w;t4 
and g(t) =t?~4 if p > q, or g(t) = log(t) if p = q. 

Mean of Bajraktarevic is not generally an aggregation function because it 
fails the monotonicity condition. The following sufficient condition for mono- 
tonicity of mixture functions has been established in F 

Let weight functions w;(t) > 0 be differentiable and monotone non- 
decreasing, and g(t) = t. If wi(t) < w:(t) for all t € [0,1] and all i =1,...,n, 
then f in is monotone non-decreasing (i.e., an aggregation function). 
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Fig. 2.19. 3D plots of generalized logarithmic means L~° and L~'. 


2.5 Ordered Weighted Averaging 


2.5.1 Definitions 


Ordered weighted averaging functions (OWA) also belong to the class of av- 
eraging aggregation functions. They differ to the weighted arithmetic means 
in that the weights are associated not with the particular inputs, but with 
their magnitude. In some applications, all inputs are equivalent, and the im- 
portance of an input is determined by its value. For example, when a robot 
navigates obstacles using several sensors, the largest input (the closest obsta- 
cle) is the most important. OWA are symmetric aggregation functions that 
allocate weights according to the input value. Thus OWA can emphasize the 
largest, the smallest or mid-range inputs. They have been introduced by Yager 
and have become very popular in the fuzzy sets community. 

We recall the notation xx, which denotes the vector obtained from x by 
arranging its components in non-increasing order %(1) È %(2) =... > Xn): 


Definition 2.49 (OWA). For a given weighting vector w, wi > 0, > wi = 1, 
the OWA function is given by 


OW Aw(x) = Xuizo =<w,x\ >. 
i=1 


Calculation of the value of an OWA function involves using a sort () 
operation. 


Special cases 


e If all weights are equal, w; = E, OWA becomes the arithmetic mean 
OW Aw(x) = M(x); 
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If w = (1,0,...,0), then OW Ay (x) = max(x); 

If w = (0,...,0,1), then OW Ay, (x) = min(x); 

If w = (a,0,...,0,1—a), then OWA becomes the Hurwizc aggregation 
function, OW Ay (x) = amax(x) + (1 — a) min(x); 

If w; = 0 for all i except the k-th, and wg = 1, then OWA becomes k-th 
order statistic, OW Aw(x) = 2). 





Fig. 2.20. 3D plots of OWA functions OW Aio,7,0.3) and OW A(o.2,0.8)- 


Definition 2.50 (Reverse OWA). Given an OWA function OW Ay, the 
reverse OWA is OW Aw, with the weighting vector wa = (Wn, Wn—-1,---;W1)- 


2.5.2 Main properties 


As with all averaging aggregation functions, OWA are non-decreasing 
(strictly increasing if all weights are positive) and idempotent; 

The dual of an OWA function is the reverse OWA, with the vector of 
weights Wa = (Wn, Wn-1,---,W1)- 

OWA functions are continuous, symmetric, homogeneous and shift-invariant; 
OWA functions do not have neutral or absorbing elements, except for the 
special cases min and max; 

The OWA functions are special cases of the Choquet integral (see Section 
[2.6) with respect to symmetric fuzzy measures. 


Orness measure 


The general expression for the measure of orness, given in (2.1), translates 
into the following simple formula 
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n 


orness(OW Ay) = 5 Wi 


i=l 


n—i n— 2 1 


= Aw (1, ——,..., —=, 
OW cet n—1 n—1 





0). (2.25) 


n—1 


Here is a list of additional properties involving the orness value. 


e The orness of OWA and its dual are related by 
orness(OW Aw) = 1 — orness(OW Aw). 


An OWA function is self-dual if and only if orness(OW Aw) = 4. 

In the special cases orness(max) = 1, orness(min) = 0, and orness(M) = 
4. Furthermore, the orness of OWA is 1 only if it is the max function and 
0 only if it is the min function. However orness can be 4 for an OWA 
different from the arithmetic mean, which is nevertheless self-dual. 

e If the weighting vector is non-decreasing, i.e., wi < Wi+1,i = 1,... Nn — 1, 
then orness(OW Aw) € [5,1]. If the weighting vector is non-increasing, 
then orness(OW Aw) € [0, 4]. 

e Iftwo OWA functions with weighing vectors w1, W2 have their respective 
orness values O1, O2, and if w3 = aw, + (1 — a)w2, a € [0,1], then OWA 
function with the weighting vector w3 has orness value g 


orness(OW Aw,) = a0; + (1 — a)OÞ2. 


Note 2.51. Of course, to determine an OWA weighting vector with the desired orness 
value, one can use many different combinations of w1, w2, which all result in different 
w3 but with the same orness value. 


Example 2.52. The measure of orness for some special weighting vectors has 
been precalculated in 


isi 

w=- 2 7 orness(OW Aw) = T 
wi = Antie orness(OW Aw) = A 
n(n + 1) 9 


Note 2.53. The classes of recursive and iterative OWA functions, which have the 
same orness value for any given n, was investigated in [243]. 

Entropy 

We recall the definition of weights dispersion (entropy), Definition [2.37] p. [57] 


Disp(w) = — ‘> w; log wi. 
i=1 


It measures the degree to which all the information (i.e., all the inputs) is used 
in the aggregation process. The entropy is used to define the weights with the 
maximal entropy (functions called MEOWA), subject to a predefined orness 
value (details are in Section 2.5.5). 
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e Ifthe orness is not specified, the maximum of Disp is achieved at w; = 4, 
i.e., the arithmetic mean, and Disp(+, griy =) = logn. 

e The minimum value of Disp, 0, is achieved if and only if w; = 0, i Æ k, 
and wy; = 1, i.e., the order statistic, see Section 2.8.2] 

e The entropy of an OWA and its dual (reverse OWA) coincide, Disp(w) = 
Disp(wa). 


Similarly to the case of weighted quasi-arithmetic means, weighting trian- 
gles (see Section should be used if one needs to work with families of 
OWAs (i.e., OWA extended aggregation functions in the sense of Definition 
[1.6). We also note that there are other types of entropy (e.g., Rényi entropy) 
used to quantify weights dispersion, see, e.g., [266]. One such measure of dis- 
persion was presented in and is calculated as 


n 


x i (wi — Wat) i 


2 (we — Wi41)) 





1- W(1) 





n—-1 W(1) 


where wq) denotes the i-th largest weight. A related measure of weights dis- 
persion is p(w) = 1 — wa) be. Another useful measure is weights variance 
Hod 


, see Eq. (2.38) on p. [79] 


2.5.3 Other types of OWA functions 
Weighted OWA 


The weights in weighted means and OWA functions represent different aspects. 
In weighted means w; reflects the importance of the i-th input, whereas in 
OWA it reflects the importance of the i-th largest input. In Torra pro- 
posed a generalization of both weighted means and OWA, called weighted 
OWA (WOWA). This aggregation function has two sets of weights w, p. Vec- 
tor p plays the same role as the weighting vector in weighted means, and w 
plays the role of the weighting vector in OWA functions. 

Consider the following motivation. A robot needs to combine information 
coming from n different sensors, which provide distances to the obstacles. 
The reliability of the sensors is known (i.e., we have weights p). However, 
independent of their reliability, the distances to the nearest obstacles are more 
important, so irrespective of the reliability of each sensor, their inputs are also 
weighted according to their numerical value, hence we have another weighting 
vector w. Thus both factors, the size of the inputs and the reliability of the 
inputs, need to be taken into account. WOWA provides exactly this type of 
aggregation function. 

WOWA function becomes the weighted arithmetic mean if w; = 4 i = 
1,...,n, and becomes the usual OWA if p; = +i = Lessa: 
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Definition 2.54 (Weighted OWA). Let w,p be two weighting vectors, 
wi pi > 0, Siw; = Sop; = 1. The following function is called Weighted 
OWA function 


WOW Aw p(x) = 5 Uit (i), 


where xa) is the i-th largest component of x, and the weights u; are defined 


as 
uw=9| > m] =| 2, x], 


JEH: J€Ai-1 


where the set H; = {j|x; > xi} is the set of indices of i largest elements of x, 
and g is a monotone non-decreasing function with two properties: 


1. g(t/n) = X j<; wj,t=0,...,n (of course g(0) = 0); 
2. g is linear if the points (i/n, j<; Wj) lie on a straight line. 


Thus computation of WOWA involves a very similar procedure as that of 
OWA (i.e., sorting components of x and then computing their weighted sum), 
but the weights u; are defined by using both vectors w, p, a special monotone 
function g, and depend on the components of x as well. One can see WOWA 
as an OWA function with the weights u. Let us list some of the properties of 
WOWA. 


First, the weighting vector u satisfies u; > 0, > wu; = 1. 

If wi = +, then WOW Aw p(X) = Mp(x), the weighted arithmetic mean. 
If pi = Í, WOW Aw p(x) = OW Aw(x). 

WOWA is an idempotent aggregation function. 


Of course, the weights u also depend on the generating function g. This 
function can be chosen as a linear ve (i.e., a broken line interpolant), in- 
terpolating the points (i/n, >> j<i wj) (in which case it automatically becomes 
a linear function if these points are on a ine line), iz as a monotone 
quadratic spline, as was suggested in (239) [234 see n [4 where Schu- 
maker’s quadratic spline algorithm was used [219 adhe automatically satis- 
fies the straight line condition when needed. 

It turns out that WOWA belongs to a more general class of Choquet 
integral based aggregation functions, discussed in Section [2.6] with respect to 
distorted probabilities, see Definition 2.113] [1 (193), [2331 [237 pel Gad 1 is a piecewise 
linear function whose linear segments are defined on the simplicial partition 
of the unit cube [0,1]": S; = {x € [0,1]"|zp(;) = tpij41)}, where p is a 
permutation of the set {1,...,n}. Note that there are exactly n! possible 
permutations, the union of all S; is [0, 1]”, and the intersection of the interiors 
of SiN Sj _ i Żj. 


2.5 Ordered Weighted Averaging 73 
Neat OWA 


OWA functions have been generalized to functions whose weights depend on 
the aggregated inputs. 


Definition 2.55 (Neat OWA). An OWA function whose weights are de- 


fined by 
xr. 
Wi = FR i) , 


p 
2 Ta 


with p €] — co, 00] is called a neat OWA. 





Note 2.56. Neat OWA functions are counter-harmonic means (see p.[63). We remind 
that they are not monotone (hence not aggregation functions). 


2.5.4 Generalized OWA 


Similarly to quasi-arithmetic means (Section 2.3), OWA functions have been 
generalized with the help of generating functions g : [0,1] — [—oo, oo] as 


Definition 2.57 (Generalized OWA). Let g : [0,1] — [—co, 00] be a con- 


tinuous strictly monotone function and let w be a weighting vector. The func- 
tion 


GenOW Aw g(x a wiglXii) ) (2.26) 


is called a generalized OWA (also known as ordered weighted quasi-arithmetic 
mean LJ). As for OWA, xq denotes the i-th largest value of x. 


Special cases 
Ordered weighted geometric function was studied in (125) Red. 


Definition 2.58 (Ordered Weighted Geometric function (OWG)). 
For a given weighting vector w, the OWG function is 


OW Gw -JJe x (2.27) 


Note 2.59. Similarly to the weighted geometric mean, OWG is a special case of (2.26) 
with the generating function g = log. 
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Definition 2.60 (Ordered Weighted Harmonic function (OWH)). 
For a given weighting vector w, the OWH function is 


OW Hy (x) = (>: =) . (2.28) 


A large family of generalized OWA functions is based on power functions, 
similar to weighted power means 271|. Let gr denote the family of power 


functions 4 
ts ifr #0, 
g(t) = a if r= 0. 


Definition 2.61 (Power-based generalized OWA). For a given weight- 
ing vector w, and a value r E€ R, the function 


m l/r 
GenOW Ay, Jr] (x) = (>: vat i (2.29) 
i=1 
ifr #0, and GenOW Aw jir](x) = OWGw(x) if r = 0, is called a power-based 
generalized OWA . 


Of course, both OWG and OWH functions are special cases of power-based 
OWA with r = 0 and r = —1 respectively. The usual OWA corresponds to 
r = 1. Another special case is that of quadratic OWA, r = 2, given by 


OW Qw(x) = 





Other generating functions can also be used to define generalized OWA 
functions. 


Example 2.62 (Trigonometric OWA). Let gi(t) = sin($t), g2(t) = cos($t), 


and g3(t) = tan($t) be the generating functions. The trigonometric OWA 
functions are the functions 


OW ASw(x) = Z arcsin Wi sin(Zz)), 
2 = T 
OW ACy (x) = a arccos(} > Wi cos(5-2(i))), and 


i=1 


9 n 
OW AT, (x) = 2 arctan() > Wi tan(52())). 


i=l 
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Fig. 2.21. 3D plots of OWH functions OW Ho.9,0.1) and OW H0.2,0.8)- 





Fig. 2.22. 3D plots of quadratic OWA functions OW Q,o.7,0.3) and OW Q(0.2,0.8)- 


Example 2.63 (Exponential OWA). Let the generating function be 
t . 
fr yA, 
a(t) = n ify=1. 
The exponential OWA is the function 


_ J log, ia wO), ify Al, 
OW AEw,,y(x) z { OW Aw (x); if y= i 


Example 2.64 (Radical OWA). Let y > 0, y 4 1, and let the generating 
function be 

g(t) = 
The radical OWA is the function 
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=i 
n 
= (o uno) 


i=1 


OW ARw, +(x 





3D plots of some generalized OWA functions are presented on Figures 


2.2212.24) 





Fig. 2.23. 3D plots of radical OWA functions OW ARọo.9,0.1),100 and 
OW AR .o.999,0.001),100- 
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Fig. 2.24. 3D plots of trigonometric OWA functions OW AS{o.9,0.1) and 
OW AT o.2,0.8): 
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2.5.5 How to choose weights in OWA 
Methods based on data 


The problem of identification of weights of OWA functions was studied by 
several authors [o3] Lod [276]. A common feature of all methods is to 
eliminate nonlinearity due to reordering of the components of x by restrict- 
ing the domain of this function to the simplex S C [0,1]”" defined by the 
inequalities zı > £2 >... > £n. On that domain OWA function is a linear 
function (it coincides with the arithmetic mean). Once the coefficients of this 
function are found, OWA function can be computed on the whole [0,1]" by 
using its symmetry. Algorithmically, it amounts to using an auxiliary data set 
{(Zk, yx) }, where vectors Zk = Xn. Thus identification of weights of OWA 
functions is a very similar problem to identification of weights of arithmetic 
means in Section [2.3.7] Depending on whether we use least squares or least 
absolute deviation criterion, we solve it by using either quadratic or linear 
programming techniques. In the first case we have the problem 


K n 2 
min 5 (= W5Zik — m) (2.30) 
k=1 \i=1 


nm 
st. So w= l, ur >0,4=1,...,n. 
i=1 


In the second case we have 


Max 


(2.31) 


min 








nm 
Yo Wizik — Yk 
4=1 


k=1 


n 
st. .>wj=1,w,>0,i=1,...,n, 
i=1 
which converts to a linear programming problem, see Appendix [A2] 

Filev and Yager proposed a nonlinear change in variables to obtain an 
unrestricted minimization problem, which they propose to solve using non- 
linear local optimization methods. Unfortunately the resulting nonlinear op- 
timization problem is difficult due to a large number of local minimizers, and 
the traditional optimization methods are stuck in the local minima. 

The approach relying on quadratic programming was used in L Ro, 235, 
[236] [275], and it was shown to be numerically efficient and stable with respect 
to rank deficiency (e.g., when K < n, or the data are linearly dependent). 

Often an additional requirement is imposed: the desired value of the mea- 
sure of orness orness(f) = a € [0,1]. This requirement is easily incorporated 
into a QP or LP problem as an additional linear equality constraint, namely 


n . 
aot 

Wi = Q. 
i n— 1 
w=1 
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Preservation of ordering of the outputs 


We may also require that the ordering of the outputs is preserved, i.e., if yj < 
Yr then we expect f(x;) < f(x) (see Section[L.6] p.B4). We arrange the data, 
so that the outputs are in non-decreasing order, yx < Yk+1, k =1,...,K — 1. 
Then we define the additional linear constraints 

n 

< Zk+1 — Zk, W >= 5 Wwi(Zik+1 — Zik) 2 0, 

i=1 
k= 1,...,K — 1. We add the above constraints to problem (2.30) or (2:31) 
and solve it. The addition of an extra K — 1 constraints neither changes 
the structure of the optimization problem, nor does it drastically affect its 
complexity. 


Methods based on a measure of dispersion 


Maximum entropy OWA 


A different approach to choosing OWA weights was proposed in and 
followed in (tod. It does not use any empirical data, but various measures of 
weight entropy or dispersion. The measure of weights dispersion (see Definition 
[2.7lon p. [57] also see p.[70) is defined as 


Disp(w) = — 5 w; log wi, (2.32) 
i=1 


The idea is to choose for a given n such a vector of weights that maximizes 
the dispersion Disp(w). 
It is formulated as an optimization problem 





n 
min w; log wi (2.33) 
i=l 
m 
s.t. Da Wi = 1; 
i=1 
n pi 
i=1 
Wi 20e 1.04.5. 


The solution is provided in and is called Maximum Entropy OWA 
(MEOWA). Using the method of Lagrange multipliers, the authors obtain 
the following expressions for w;: 


wi = (Wiwi) i = 2,...,n—-1, (2.34) 
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_ ((n-lA)a-—n)wi +1 
" (n-Netl—-nuy ’ 
and w, being the unique solution to the equation 
wil(n — la + 1 — nw]" = ((n— 1a)" 1" ((n — 1)a — n)wı +1] (2.35) 
on the interval (0, 4). 
Note 2.65. For n = 3, we obtain w2 = ,/wiw3 independently of the value of a. 


A different representation of the same solution was given in . Let t be 
the (unique) positive solution to the equation 





dt (d+1)t? 2 +...+ (d+n—2)t+ (d+n-—1) =0, (2.36) 


with d = —a(n — 1). Then the MEOWA weights are identified from 


Wi = 


ti 2 : 

P iSl., where T= X Ë. (2.37) 
j=1 

Note 2.66. It is not difficult to check that both (2:34) and (2.37) represent the same 

set of weights, noting that t = »-{/* = Eel or wi = Hite and that 


w1 
substituting wi into (2.35) yields 
n n(1 — t) 
a = 
1—d(1-— t)’ 


which translates into i n 
—t 


1-t 





—d(1—t")-—n=0, 


and then into 
dt? +E HHE? H.. 4t4+(1—d—n) =0. 


After factoring out (t — 1) we obtain (2.36). 





Minimum variance OWA 


Another popular characteristic of weighting vector is weights variance, defined 
as 

1S 1 

D*(w) = = (wi — Mw)? = = ou? — S, (2.38) 


me 
i=l i=1 


where M/(w) is the arithmetic mean of w. 

Here one minimizes D?(w) subject to given orness measure. The resulting 
OWA function is called Minumum Variance OWA (MVOWA). Since adding 
a constant to the objective function does not change the minimizer, this is 
equivalent to the problem 
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n 





min Y w? (2.39) 
i=1 
n . 
s.t. D wit =a, 
i=1 


n 
Sw; =1,w; >0,4=1,...,0n. 
i=l 


For a= 4 the optimal solution is always w; = tj = 1,...,n. It is also 
worth noting that the optimal solution to (2.39) for a > 4, w*, is related to 
the optimal solution for a < 4, w, by wi = Wn—i41, Le., it gives the reverse 
OWA. Thus it is sufficient to establish the optimal solution in the case a < 4. 

The optimal solution (101 fora < 4 is given as the vector w = 
(0,0,...,0,Wr,...,Wn), i.e., wy =O if j <r, and 





6(n — 1)a — 2(n— r - 1) 

















and 
=r 





wj = Wr + (wn — wr), Tr<İ <n. 


The index r depends on the value of a, and is found from the inequalities 
n—3(n—-l)a-l<r<n-—3(n—Il)a. 


Recently it was established that the solution to the minimum variance 
OWA weights problem is equivalent to that of minimax disparity [251], i.e., 
the solution to 





min { max |w; — wal} (2.40) 
i=1,....n—1 
n 
s.t X Wi — = a, 
i=l 


We reiterate that the weights of OWA functions obtained as solutions to 
maximum entropy or minimum variance problems are fixed for any given n 
and orness measure, and can be precomputed. However, both criteria are also 
useful for data driven weights identification (in Section [2.5.5), if there are 
multiple optimal solutions. Then the solution maximizing Disp(w) or mini- 
mizing D(w) is chosen. Torra [235] proposes to solve an auxiliary univariate 
optimization problem to maximize weights dispersion, subject to a given value 
of (2.32). On the other hand, one can fit the orness value a of MEOWA or 
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MVOWA to empirical data, using a univariate nonlinear optimization method, 
in which at each iteration the vector w is computed using analytical solutions 
to problems and (2.39). 

Furthermore, it is possible to include both criteria directly into problem 
(2.30). It is especially convenient for the minimum variance criterion, as it 
yields a modified quadratic programming problem 





K n 2 n 
min > (È WiZik — w) +u? (2.41) 
k=1 \i=1 i=1 
n s 
s-t. Sa isi; 
i=1 
n 
YS wi = 1, wi >0,i= eges Ti 
i=1 


where \ > 0 is a user-specified parameter controlling the balance between the 
criterion of fitting the data and that of obtaining minimum variance weights. 


Methods based on weight generating functions 


Yager has proposed to use monotone continuous functions Q : 
[0, 1] > [0, 1], Q(0) = 0, Q(1) = 1, called Basic Unit-interval Monotone (BUM) 
functions, or Regular Increasing Monotone (RIM) quantifiers 264. These 
functions generate OWA weights for any n using (see Section [2.3.5) 


w=9(+)-o(=+), (2.42) 


RIM quantifiers are fuzzy linguistic quantifiers [£] that express the concept 
of fuzzy majority. Yager defined such quantifiers for fuzzy sets “for all”, “there 
exists”, “identity”, “most”, “at least half”, “as many as possible” as follows. 


“for all”: Q forat(t) = 0 for all t € (0, 1) and Q forat(1) =l; 


“there exists”: Qezxists (t) = 1 for all t € (0,1] and Qezists(0) = 0. 
“identity”: Qra(t) = t. 





Other mentioned quantifiers are expressed by 


0, ift<a, 
Qap(t) = 4 =e ifa<t<b, (2.43) 
L Hts bh 


ki 


Then we can choose pairs (a, b) = (0.3, 0.8) for “most”, (a,b) = (0, 0.5) for 
“at least half” and (a,b) = (0.5, 1) for “as many as possible”. 
Calculation of weights results in the following OWA: 


13 Le., Q is a monotone increasing function [0, 1] — [0, 1], Q(0) = 0, Q(1) = 1 whose 
value Q(t) represents the degree to which t satisfies the fuzzy concept represented 
by the quantifier. 
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e “for all”: w = (0,0,...,0,1), OW Aw = min. 
e “there exists”: w = (1,0,0,...,0), OW Aw = max. 
e “identity”: w = (4,...,+), OW Aw = M. 


Example 2.67. Consider linguistic quantifier “most”, given by (2.43) with 
(a,b) = (0.3,0.8) and n = 5. The weighting vector is then (0, 0.2, 0.4, 0.4, 0). 


Weight generating functions are applied to generate weights of both quasi- 
arithmetic means and OWA functions. They allow one to compute the degree 
of orness of an OWA function in the limiting case 


1 

lim orness(f,) = orness(Q) =) Q(t)dt. 

n— oo 0 

Entropy and other characteristics can also be computed based on Q, see [244]. 
Yager has proposed using generating, or stress functions (see also 


), defined by 


Definition 2.68 (Generating function of RIM quantifiers). Let q : 
[0,1] — [0, 00] be an (integrable) function. It is a generating function of RIM 
quantifier Q, if 


QO = 5 | aude, 


where K = fo q(u)du is the normalization constant. The normalized generat- 
ing function will be referred to as q(t) = ae) 

Note 2.69. The generating function has the properties of a density function (e.g., 
a probability distribution density, although Q is not necessarily interpreted as a 
probability). If Q is differentiable, we may put q(t) = Q’(t). Of course, for a given 
Q, if a generating function exists, it is not unique. 


Note 2.70. In general, Q needs not be continuous to have a generating function. For 
example, it may be generated by Dirac’s delta function 


HO) Sahar 


0 otherwise, 


constrained by f°. 5(t)dt = 1. 
By using the generating function we generate the weights as 


a 1 i\ 1 
w= Kas w 
Note that these weights provide an approximation to the weights generated 


by (2.42), and that they do not necessarily sum to one. To ensure the latter, 
we shall use the weights 


14 This is an informal definition. The proper definition involves the concepts of 
distributions and measures, see, e.g., 
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= _ aa) (2.44) 


Eq. provides an alternative method for OWA weight generation, 
independent of Q, while at the same time it gives an approximation to the 
weights provided by (2.42). Various interpretations of generating functions are 
provided in (274), from which we quote just a few examples. 


Example 2.71. 


e A constant generating function q(t) = 1 generates weights w; = 4, i.e., 
the arithmetic mean. 

e Constant in range function g(t) = 1 for t < 6 and 0 otherwise, emphasizes 
the larger arguments, and generates the weights w; = Li =1,...,r and 
w;,=0,i=r+1,...,n, where r is the largest integer less or equal Jn. 

e Generating function q(t) = 1, for a < t < 6 and 0 otherwise, emphasizes 
the “middle” arguments, and generates the weights w; = ppl =r+ 
1,...,p and 0 otherwise, with (for simplicity) an = r and Bn = p. 

e Generating function with two tails q(t) = 1 if t € [0,a] or t € [8,1] 
and 0 otherwise, emphasizes both large and small arguments and yields 


wi = ET for i = 1,...,rı and i = n + 1 — r2,..., n, and w; = 0,i = 
rı +1,..., n — r2, with rı = an, ro = pn integers. 
e Linear stress function q(t) = t generates weights w; = w— = —# 


j=1? n(n+1)? 
which gives orness value 4, compare to Example[2.52] It emphasizes smaller 
arguments. 


Of course, by using the same approach (i.e., Q(t) or g(t)) one can generate 
the weights of generalized OWA and weighted quasi-arithmetic means. How- 
ever the interpretation and the limiting cases for the means will be different. 
For example the weighting vector w = (1,0,...,0) results not in the max 
function, but in the projection to the first coordinate f(x) = 21. 


Fitting weight generating functions 


Weight generating functions allow one to compute weighting vectors of OWA 
and weighted means for any number of arguments, i.e., to obtain extended ag- 
gregation functions in the sense of Definition[L.6] This is very convenient when 
the number of arguments is not known a priori. Next we pose the question as 
to whether it is possible to learn weight generating functions from empirical 
data, similarly to determining weighting vectors of aggregation functions of a 
fixed dimension. 

A positive answer was provided in (20), Bo). The method consists in repre- 
senting a weight generating function with a spline or polynomial, and fit- 
ting its coefficients by solving a least squares or least absolute deviation 
problem subject to a number of linear constraints. Consider a data set 
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(Xk, Yk),k = 1,...,K, where vectors x, € [0,1]"* need not have the same 


dimension (see Table 2.1). This is because we are dealing with an extended 
aggregation function — a family of n-ary aggregation functions. 


Table 2.1. A data set with inputs of varying dimension. 





First, let us use the method of monotone splines, discussed in the Appendix 
We write 


(t) = X` c;B;(t),t € (0,1) and Q(0) = 0,Q(1) =1, 


j=1 


where functions B;(t) constitute a convenient basis for polynomial splines, 
aa, in which the condition of monotonicity of Q is expressed as c; > 0,7 = 
1,..., J. We do not require Q to be continuous on [0, 1] but only on ]0, 1[. We 
also have two linear constraints 


J 
cjB;(0)>0, X oB 
j=1 


which convert to equalities if we want Q to be continuous on [0,1]. Next put 
this expression in (2.42) to get 


Ms 


j=1 
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i=l 
ay 2 i i—1 
= Zi c;|B;|—)-B 
> (De [ (a) 8) 
2 Nnkg— 1 
+21k X gB; — } —0] + nrk 1— So cB; 
j=l j=l Nk 
J Nnk—1 i gaj 
= Cj zik |B; | — | — B; 
3 o (2) 





I ng— 1 
B; | — | — Znak Bi hak 
+ Zik (+) Zngk of Ti )) Hemas 
J 
= X cjg (xe) + Zngk- 


The vectors zz stand for x, when we treat weighted arithmetic means, or x;\, 
when we deal with OWA functions. The entries A;(x;) are computed from 
zik using expression in the brackets. Note that if Q is continuous on [0,1] the 
expression simplifies to 


sus $(S[a(e)-0)) 


j=1 





Consider now the least squares approximation of empirical data. We obtain 
a quadratic programming problem 


k=1 \j=1 


2 
K J 
minimize ` e cj A; (Xk) + Znęk — n) (2.45) 


J J 

s.t. c;B;(0) > 0, 5 e; B;(1) < 1, 
j=l j=l 
Cj = 0. 


The solution is performed by QP programming methods described in Ap- 
pendix [A.5] 

OWA aggregation functions and weighted arithmetic means are special 
cases of Choquet integral based aggregation functions, described in the next 
section. Choquet integrals are defined with respect to a fuzzy measure (see 
Definition [2.75). When the fuzzy measure is additive, Choquet integrals be- 
come weighted arithmetic means, and when the fuzzy measure is symmetric, 
they become OWA functions. There are special classes of fuzzy measures called 
k-additive measures (see Definition 2.121). We will discuss them in detail in 
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Section 2.6.3] and in the remainder of this section we will present a method 
for identifying weight generating functions that correspond to symmetric 2- 
and 3-additive fuzzy measures. These fuzzy measures lead to OWA functions 
with special weights distributions. 


Proposition 2.72. fad) A Choquet integral based aggregation function with 
respect to a symmetric 2-additive fuzzy measure is an OWA function whose 
weight generating function is given by 


Q(t) = at? + (1 — a)t for some a € [—1, 1]. 


Furthermore, such an OWA weighting vector is equidistant (i.e., Wiņı — Wi = 
const for alli =1,...,n—1). 

A Choquet integral based aggregation function with respect to a symmetric 
3-additive fuzzy measure is an OWA function whose weight generating func- 
tion is given by 

Q(t) = at? + bt? + (1 — a — b)t for some a € [—2, 4], 
such that 
e ifae[-2 eee Sea e 
e if a €]1,4] then b € [—3a/2 — \/3a(4 — a)/4, —3a/2 + \/3a(4 — a)/4]. 

Proposition [2.72] provides two a classes of OWA functions that 
correspond to 2- and 3-additive symmetric fuzzy measures. In these cases, 
rather than fitting a general monotone non-decreasing function, we fit a 
quadratic or cubic function, identified by parameters a and b. 


Interestingly, in the case of 2-additive symmetric fuzzy measure, we obtain 
the following formula, a linear combination of OWA and the arithmetic mean 


f(x) = aOW Ay (x) + (1 — a)M(x), 





with w = 4 2 oy ant). In this case the solution is explicit, the optimal 
a is given z 7 Bo) 
K 
D (ye — Un) Ve 
: k=1 
a = max $ —1,min< 1, z i 
LW 
k=1 
where 
Uk = sa and 
Nk 
i=1 





tf 1 ) 
Vp = — — 1) zik. 


For 3-additive symmetric fuzzy measures the solution is found numerically 
by solving a convex optimization problem in the feasible domain D in Propo- 
sition 2.72] which is the intersection of a polytope and an ellipse. Details are 
provided in 
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2.5.6 Choosing parameters of generalized OWA 
Choosing weight generating functions 


Consider the case of generalized OWA functions, where a given generating 
function g is given. As earlier, we have a data set (xz, yx), k = 1,..., K, and 
we are interested in finding the weighting vector w that fits the data best. 
When we use the least squares, as discussed on p. [B3] we have the following 
optimization problem 


min È (~ (È wiglen)) z m) ' (2.46) 


n 
st. JO w; = 1, w; > 0,i= 1,...,n, 
i=l 


where Z = XxX, (see Section 2.5.5). This problem is converted to a QP 
problem similarly to the case of weighted quasi-arithmetic means: 


2 wig(Zik) — om) (2.47) 


i=] 


K 
min ` ( 
k=1 


n 
s.t. So wi = 1, w; > 0,i = 1,... n. 
i=l 


This is a standard convex QP problem, and the solution methods are discussed 
in the Appendix [A5] This approach is presented in 2d. 

If one uses the least absolute deviation (LAD) criterion (p. B3) we obtain 
a different optimization problem 


K 


min 2 2 wig(zik) — g(Yk) (2.48) 


n 
s.t. ` wi = 1, w; > 0,i = 1,..., n. 
i=l 


This problem is subsequently converted into a linear programming problem 
as discussed in the Appendix A2] 
As in the case of weighted quasi-arithmetic means, in the presence of mul- 


tiple optimal solutions, one can use an additional criterion of the dispersion 
of weights [235]. 


Preservation of ordering of the outputs 


If we require that the ordering of the outputs be preserved, i.e., if yj < yx then 
we expect f(x;) < f(x) (see Section [1.6] p. B4), then we arrange the data, so 
that the outputs are in a non-decreasing order, yx < Yk+1, k = 1,..., K- 1. 
Then we define the additional linear constraints 


88 2 Averaging Functions 


< g(2k+1) — g(Zk), wW >= > wi(g(zi,k+1) — g(zik)) = 0, 


k =1,...,K — 1. We add the above constraints to problem (2.47) or (2.48) 
and solve the modified problem. 


Choosing generating functions 


Consider now the case where the generating function g is also unknown, and 

hence it has to be found based on the data. We study two cases: a) when g is 

given algebraically, with one or more unknown parameters to estimate (e.g., 

gr(t) = t”, r unknown), and b) when no specific algebraic form of g is given. 
In the first case we solve the problem 


K n 2 
min $ ($ wgn) =at) (2.49) 
’ k=1 = 


w=1 

n 

st. Sow; =1,w; >0,1=1,...,n, 
i=l 


plus conditions on r. 


While this general optimization problem is non-convex and nonlinear (i.e., 
difficult to solve), we can convert it to a bi-level optimization problem (see 


Appendix [A.5.3) 
K 
min |min `. ( 


W k=l Vi 


3 WiGr (Zik) — grlu) ) | (2.50) 


n 
s.t. S w= lw >0,1=1,...,n, 
i=1 
plus conditions on r. 


The problem at the inner level is the same as with a fixed gr, which 
is a QP problem. At the outer level we have a global optimization problem 
with respect to a single parameter r. It is solved by using one of the methods 
discussed in Appendix We recommend deterministic Pijavski- 
Shubert method. 


Example 2.73. Determine the weights and the generating function of a family 
of generalized OWA based on the power function, subject to a given measure 
of orness a. We have gr(t) = t”, and hence solve bi-level optimization problem 


K n 4 
min E D (È Wizi, — vt) | (2.51) 
W k=1 \i=1 


n 
5t J w= hur > 0i Shen 
i=1 


{= 





n A 
n—t 

Wi (=) = Q. 

i=1 
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Of course, for numerical purposes we need to limit the range for r to a finite 
interval, and treat all the limiting cases r — too, r — 0 and r > —1. 





A different situation arises when the parametric form of g is not given. The 
approach proposed in is based on approximation of g with a monotone 
linear spline, as 


J 
g(t) =P BO, (2.52) 


where Bj are appropriately chosen basis functions, and cj are spline coef- 
ficients. The monotonicity of g is ensured by imposing linear restrictions 
on spline coefficients, in particular non-negativity, as in ia. Further, since 
the generating function is defined up to an arbitrary linear transformation, 
one has to fix a particular g by specifying two interpolation conditions, like 
g(a) = 0,g(b) = 1,a,b €]0,1[, and if necessary, properly model asymptotic 
behavior if g(0) or g(1) are infinite. 

After rearranging the terms of the sum, the problem of identification be- 
comes (subject to linear conditions on c, w) 


2 


(2.53) 


K J n 
min $ Soa b wiBj (zik) — B; (yr) 
k=1 \j=1 i=l 


For a fixed c (i.e., fixed g) we have a quadratic programming problem to find 
w, and for a fixed w, we have a quadratic programming problem to find c. 
However if we consider both c,w as variables, we obtain a difficult global 
optimization problem. We convert it to a bi-level optimization problem 





2 


K J n 
nana > (Xo [ZuBe Btw 
k=1 \ j=l i=l 


where at the inner level we have a QP problem and at the outer level we have 
a nonlinear problem with multiple local minima. When the number of spline 
coefficients J is not very large (< 10), this problem can be efficiently solved 
by using deterministic global optimization methods from Appendix [A.5.5) If 
the number of variables is small and J is large, then reversing the order of 
minimization (i.e., using minw mine) is more efficient. 





Choosing generating functions and weight generating functions 


We remind the definition of Generalized OWA Consider the case of 
generating function g(t) = t”, in which case 


n l/r 
GenOW Aw r(x) = (>: war) . 
i=1 
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Consider first a fixed r. To find a weight generating function Q(t), we first 
linearize the least squares problem to get 


nin $ ($a $a (e) -2 (e-n) 


GNA 
J J 
s.t. X gB;(0)=0, X B1) =1, 


j=l j=l 


This is a standard QP, which differs from (2.45) because zik and y, are raised 
to power r (we considered a simpler case of Q(t) continuous on [0, 1]). 

Now, if the parameter r is also unknown, we determine it from data by 
setting a bi-level optimization problem 


J= 


K J Nk x : ° 
minmin )> g |B; (+) = Bj (=)| (zik)” — YR 
roe k=l \jal i=l 


J J 

s.t. X gB;(0)=0, X gB) =1, 
j=l j=l 
Cj 2 0, 


in which at the inner level we solve a QP problem with a fixed r, and at the 
outer level we optimize with respect to a single nonlinear parameter r, in the 
same way we did in Example[2.73] 

For more complicated case of both generating functions g and Q given 
non-parametrically (as splines) we refer to [20]. 


2.6 Choquet Integral 


2.6.1 Semantics 


In this section we present a large family of aggregation functions based on 
Choquet integrals. The Choquet integral generalizes the Lebesgue integral, 
and like it, is defined with respect to a measure. Informally, a measure is a 
function used to measure, in some sense, sets of objects (finite or infinite). 
For example, the length of an interval on the real line is an example of a 
measure, applicable to subsets of real numbers. The area or the volume are 
other examples of simple measures. A broad overview of various measures is 
nee Bsa) 

We note that measures can be additive (the measure of a set is the sum of 
the measures of its non-intersecting subsets) or non-additive. Lengths, areas 
and volumes are examples of additive measures. Lebesgue integration is de- 
fined with respect to additive measures. If a measure is non-additive, then the 
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measure of the total can be larger or smaller than the sum of the measures of 
its components. 

Choquet integration is defined with respect to not necessarily additive 
monotone measures, called fuzzy measures (see Definition 2.75] below), or ca- 
pacities (53). In this book we are interested only in discrete fuzzy measures, 
which are defined on finite discrete subsets. This is because our construc- 
tion of aggregation functions involves a finite set of inputs. In general Cho- 
quet integrals (and also various other fuzzy integrals) are defined for mea- 
am on e sets, and we refer the reader to extensive literature, e.g., 


aa main eC of Choquet integral-based aggregation is to combine the 
inputs in such a way that not only the importance of individual inputs (as in 
weighted means), or of their magnitude (as in OWA), are taken into account, 
but also of their groups (or coalitions). For example, a particular input may 
not be important by itself, but become very important in the presence of some 
other inputs. In medical diagnosis, for instance, some symptoms by themselves 
may not be really important, but may become key factors in the presence of 
other signs. 

A discrete fuzzy measure allows one to assign importances to all possible 
groups of criteria, and thus offers a much greater flexibility for modeling ag- 
gregation. It also turns out that weighted arithmetic means and OWA are 
special cases of Choquet integrals with respect to additive and symmetric 
fuzzy measures respectively. Thus we deal with a much broader class of ag- 
gregation functions. The uses of Choquet integrals as aggregation functions 


are documented in (106, kd, mu, [163 165. 


Example 2. 74. {106 Consider the problem of evaluating students in a high 
school with respect to three subjects: mathematics (M), physics (P) and lit- 
erature (L). Usually this is done by using a weighted arithmetic mean, whose 
weights are interpreted as importances of different subjects. However, stu- 
dents that are good at mathematics are usually also good at physics and vice 
versa, as these disciplines present some overlap. Thus evaluation by a weighted 
arithmetic mean will be either overestimated or underestimated for students 
good at mathematics and/or physics, depending on the weights. 

Consider three students a,b and c whose marks on the scale from 0 to 20 
are given by 


[Sinden] TF 





Suppose that the school is more scientifically oriented, so it weights M and P 
more than L, with the weights wm = wp > wz. If the school wants to favor 
well equilibrated students, then student c should be considered better than a, 
who has weakness in L. However, there is no weighting vector w, such that 
WM = Wp > WL, and Mw(cm, cp, cr) > My(am,ap, az). 
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By aggregating scores using Choquet integral, it is possible (see Example 
[2.84] below) to construct such a fuzzy measure, that the weights of individual 
subjects satisfy the requirement wm = wp > wz, but the weight attributed 
to the pair (M,P) is less that the sum wm + wp, and the well equilibrated 
student c is favored. 


2.6.2 Definitions and properties 


Definition 2.75 (Fuzzy measure). Let N = {1,2,...,n}. A discrete fuzzy 
measure is a set function] v : 2N —> [0,1] which is monotonic (i.e. v(A) < 
v(B) whenever A C B) and satisfies v(0) =0 and v(N) = 1. 


Fuzzy measures are interpreted from various points of view, and are used, 
in particular, to model uncertainty [zi [sdl md esd. In the context of aggre- 
gation functions, we are interested in another interpretation, the importance 
of a coalition, which is used in game theory and in multi-criteria decision 
making. In the Definition 2.75] a subset A C M can be considered as a coali- 
tion, so that v( A) gives us an idea about the importance or the weight of this 
coalition. The monotonicity condition implies that adding new elements to a 
coalition does not decrease its weight. 


Example 2.76. The weakest and the strongest fuzzy measures are, respectively, 


1,ifA=N, 
LA) = T otherwise; 

0, if A=0, 
2. v( A) = { 1 otherwise. 


Example 2.77. The Dirac measure is given for any A C N by 


= 1, if zo E A, 
v(A) = Le if £o € A, 


where zo is a fixed element in M. 


Example 2.78. The expression 


(A) = els 


where | A | is the number of elements in A, is a fuzzy measure. 


15 A set function is a function whose domain consists of all possible subsets of M. 
For example, for n = 3, a set function is specified by 2? = 8 values at v(0), v({1}), 


v({2}), 0(13}), v1, 2h), v({1,3}), v2, 3}), o({1, 2, 3})- 
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Definition 2.79 (Möbius transformation). Let v be a fuzzy measure Eg. 
The Möbius transformation of v is a set function defined for every ACN as 


M(A) = X` (-1)4\lo(B). 
BCA 


Möbius transformation is invertible, and one recovers v by using its inverse, 
called Zeta transform, 


v(A) = X M(B) VACN. 


BCA 


Mobius transformation is helpful in expressing various quantities, like the 
interaction indices discussed in Section [2.6.4| in a more compact form 107, 
[1081 165. It also serves as an alternative representation of a fuzzy measure, 
called Mobius representation. That is, one can either use v or M to perform 
calculations, whichever is more convenient. The conditions of monotonicity 
of a fuzzy measure, and the boundary conditions v(@) = 0,v(M) = 1 are 
expressed, respectively, as 


XO M(B) >0, foral AC N and allie A, (2.54) 
BC AlieB 
M(0)=Oand X` M(A) = 1. 
ACN 


To represent set functions (for a small n), it is convenient to arrange their 
values into an array E], e.g., for n = 3 


v({1, 2, 3}) 
v({1,2})  v({1,3}) — v({2,3}) 
v({1}) i v({3}) 


Example 2.80. Let v be the fuzzy measure on M = {1, 2,3} given by 


1 
0.9 0.5 0.3 
05 0 03 
0 


Its Möbius representation M is 


16 In general, this definition applies to any set function. 
17 Such an array is based on a Hasse diagram of the inclusion relation defined on 
the set of subsets of M. 
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0.1 
0.4 —0.3 0 
05 0 03 
0 


Note 2.81. Observe that, the sum of all values of the Möbius transformation in the 
above example is equal to 1, in accordance with (2.54). The values of v and M 
coincide on singletons. 


There are various special classes of fuzzy measures, which we discuss in 
Section 2.6.3] We now proceed with the definition of the Choquet integral- 
based aggregation functions. 


Definition 2.82 (Discrete Choquet integral). The discrete Choquet in- 
tegral with respect to a fuzzy measure v is given by 


n 


C(x) = X aa@lo({ilas > zap -vlile > sah), (2.55) 
i=1 
where X > = (X(1),%(2);+++;2(n)) is a non-decreasing permutation of the input 


X, and Z(n+1) =œ by convention. 


Alternative expressions 


e By rearranging the terms of the sum, (2:55) can also be written as 


n 


Cu(x) = X [ro — ta-1)] oF). (2.56) 


a=. 


where z(o) = 0 by convention, and H; = {(2),...,(n)} is the subset of 
indices of the n — i + 1 largest components of x. 

e The discrete Choquet integral is a linear function of the values of the fuzzy 
measure v. Let us define the following function. For each A C M let 


=max(0,minz; — ma ys 2.57 

ga(x) = max(0, min z; Be m) (2.57) 
The maximum and minimum over an empty set are taken as 0. Note that 
gA(x) = 0 unless A is the subset of indices of the k largest components of 
x, in which case g.4(x) = £x (x) — 2x (e41)- Then it is a matter of simple 
calculation to show that 


Colx) = DP v(A)gax). (2.58) 


ACN 
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Choquet integral can be expressed with the help of the Mobius transfor- 
mation as 


=> M(A ) min z; = = 5 M(A)ha (x), (2.59) 


ACN ACN 
with ha(x) = min xi. By using Definition 2.79] we obtain 
ve 
Cox) = XO ofA) SO (-1)9 min z. (2.60) 
ACN B|ACB 1ER 


By comparing this expression with (2.58) we obtain 


= = _1)/8\Al 
gA(X) = max(0, min zi Pn ae oe 1) hp(x). (2.61) 


Main properties 


19 


The Choquet integral is a continuous piecewise linear idempotent aggre- 
gation function; 

An aggregation function is a Choquet integral if and only if it is ho- 
mogeneous, shift-invariant and comonotone additive, i.e., Cy(x + y) = 
Cy(x) + Ca (y) for all comonotone[!4] x, y; 

The Choquet integral is uniquely defined by its values at the vertices of 
the unit cube [0,1]”, i.e., at the points x, whose coordinates x; € {0,1}. 
Note that there are 2” such points, the same as the number of values that 
determine the fuzzy measure v; 

Choquet integral is Lipschitz-continuous, with the Lipschitz constant 1 in 
any p-norm, which means it is a kernel aggregation function, see Definition 
1.62] p. 23} 

The class of Choquet integrals includes weighted means and OWA func- 
tions, as well as minimum, maximum and order statistics as special cases 
(see Section 2.6.5] below); 

A linear convex combination of Choquet integrals with respect to fuzzy 
measures vı and v2, aCy, +(1—a)Cy,,.@ € [0,1], is also a Choquet integral 
with respect to v = av; + (1 — ova] 

A pointwise maximum or minimum of Choquet integrals is not necessarily 
a Choquet integral (but it is in the bivariate case); 

The class of Choquet integrals is closed under duality; 


Two vectors x,y € R” are called comonotone if there exists a common per- 


mutation P of {1,2,...,n}, such that wpq) < ep) < +++ < pin) and 
Ypa) < YP) S++ < YP(n). Equivalently, this condition is frequently expressed 
as (xi — zj)(yi — yj) > O for all i, j € {1,... n}. 

As a consequence, this property holds for a linear convex combination of any 
number of fuzzy measures. 
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e Choquet integrals have neutral and absorbent elements only in the limiting 
cases of min and max. 


Other properties of Choquet integrals depend on the fuzzy measure being 
used. We discuss them in Section [2.6.3 


Calculation 


Calculation of the discrete Choquet integral is performed using Equation 
(2:56) using the following procedure. Consider the vector of pairs 
((a1, 1), (£2,2), ---, (£n, n)), where the second component of each pair is just 
the index i of x;. The second component will help keeping track of all permu- 
tations. 


Calculation of C,(x). 


1. Sort the components of ((x1, 1), (v2,2),...,(@n,n)) with respect to the 
first component of each pair in non-decreasing order. We obtain 
((za) i1), (x(2), i2), igang (Bays Ua) so that Lj) = Tij and Lj) < L541) 
for all i. Let also z(o) = 0. 

2. Let T = {1,...,n}, and S=0. 

3. For 7 =1,...,n do 
a) S:= 8 + [aq — eG-1]e7); 

b) T:=T \ {i} 

4. Return S. 


Example 2.83. Let n = 3, values of v be given and x = (0.8, 0.1, 0.6). 


Step 1. We take ((0.8, 1), (0.1, 2), (0.6,3)). 

Sort this vector of pairs to obtain ((0.1, 2), (0.6, 3), (0.8, 1)). 
Step 2. Take T = {1,2,3} and S = 0. 
Step 3. a) S := 0 + [0.1 — Oju({1,2,3}) = 0.1 x 1 = 0.1; 

b) T = {1,2,3} \ {2} = {1,3}; 

a) S := 0.1 + [0.6 — 0.1Ju({1,3}) = 0.1 + 0.50({1, 3}: 

b)T := {1,3} \ {3} = {1}; 

a) S := [0.1 + 0.5u({1, 3H] + [0.8 — 0.6]u({1}). 

Therefore C,(x) = 0.1 + 0.50({1,3}) + 0.2u({1}). 


For computational purposes it is convenient to store the values of a fuzzy 
measure v in an array v of size 2”, and to use the following indexing system, 
which provides a one-to-one mapping between the subsets 7 C M and the 
set of integers J = {0,...,2” — 1}, which index the elements of v. Take the 
binary representation of each index in I, e.g. j = 5 = 101 (binary). Now for a 
given subset J C N = {1,...,n} define its characteristic vector c € {0,1}": 
Cn—it1 = lif i € J and 0 otherwise. For example, ifn = 5, J = {1,3}, then 
c = (0,0,1,0,1). Put the value v( J) into correspondence with v;, so that the 
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binary representation of j corresponds to the characteristic vector of J. In 
our example v({1,3}) = us. 
Such an ordering of the subsets of M is called binary ordering: 


Ø, {1}, {2}, {1,2}, {3}, {1,3}, {2,3}, {1,2,3}, {4}... {1,2,0}. 


The values of v are mapped to the elements of vector v as follows 


vo vi U2 U3 U4 U5 
= V(0000)| = V(0001)| = Y(o010)} = V(0011)| = V(0100)| = V(0101 





v({1}) | 2J | o1,2)] 3P | oh 8h] -l 


Using and the above indexing system, we can write 


2" 1 
Cu(x) = F wyg) =< g(x), v >, (2.62) 
j=0 
where as earlier, functions gj, j = 0,...,2” — 1 are defined by 
u(x) = 0, min z; — i)s 2.63 
g;(x) = max(0, min x a ) (2.63) 


and the characteristic vector of the set J C N corresponds to the binary 
representation of 7. 
An alternative ordering of the values of v is based on set cardinality: 


Ø, {1}, {2},..., {n}, {1, 2}, {1, 3},..., {1, n}, {2,3},...,{n — 1, n}, {1, 2,3},.... 
ee a 
n singletons (3) pairs 


Such an ordering is useful when dealing with k-additive fuzzy measures 
(see Definition 2.121] and Proposition 2.134]below), as it allows one to group 
non-zero values M(A) (in Mobius representation) at the beginning of the 
array. We shall discuss these orderings in Section 2.6.6] 


Example 2.84. We continue Example. 74)on p.QI Let the fuzzy measure 
v be given as 


1 
0.5 0.9 0.9 
0.45 0.45 0.3 
0 


so that the ratio of weights wm : wp : wy = 3: 3: 2, but since mathematics 
and physics overlap, the weight of the pair v({M,P}) = 0.5 < v({M}) + 
v({P}). On the other hand, weights attributed to v({M, L}) and v({P, L}) 
are greater than the sum of individual weights. 

Using Choquet integral, we obtain the global scores 29 C (am, ap, aL) = 
13.9, Cu (bm, bp, bL) = 13.6 and C,(cm,cp,cr) = 14.6, so that the students 


20 Since Choquet integral is a homogeneous aggregation function, we can calculate 
it directly on [0, 20]” rather than scaling the inputs to [0, 1]”. 
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are ranked as b < a < cas required. Student b has the lowest rank as requested 
by the scientific tendency of the school. 


2.6.3 Types of fuzzy measures 


The properties of the Choquet integral depend on the fuzzy measure v being 
used. There are various generic types of fuzzy measures, which lead to specific 
features of Choquet integral-based aggregation, and to several special cases, 
such as weighted arithmetic means, OWA and WOWA discussed earlier in this 
Chapter (see also Section2.6.5). In this section we present the most important 
definitions and classes of fuzzy measures. 


Definition 2.85 (Dual fuzzy measure). Given a fuzzy measure v, its dual 
fuzzy measure v* is defined by 

v*(A) =1-v(A°), for all ACN, 
where AC =N \ A is the complement of A in N. 


Definition 2.86 (Self—-dual fuzzy measure). A fuzzy measure v is self- 
dual if it is equal to its dual vx, i.e., 


v(A) + v(A°) = 1, holds for all ACN . 


Definition 2.87 (Submodular and supermodular fuzzy measure). A 
fuzzy measure v is called submodular if for any A,B C N 


v(AU B) +v( ANB) < v(A) + vB). (2.64) 
It is called supermodular if 
v(AUB) + (ANB) = v(A) + v(B). (2.65) 


Two weaker conditions which are frequently used are called sub- and super- 
additivity. These are special cases of sub- and supermodularity for disjoint 
subsets 


Definition 2.88 (Subadditive and superadditive fuzzy measure). A 
fuzzy measure v is called subadditive if for any two nonintersecting subsets 


A,BCN, ANB=0: 

v(AUB) < v(A) + v(B). (2.66) 
It is called superadditive if 

v(AUB) > v(A) + v(B). (2.67) 
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Note 2.89. Clearly sub-(super-) modularity implies sub-(super-) additivity. 


Note 2.90. A fuzzy measure is supermodular if and only if its dual is submodular. 
However the dual of a subadditive fuzzy measure is not necessarily superadditive 
and vice versa. 


Note 2.91. A general fuzzy measure may be submodular only with respect to specific 
pairs of subsets A, B, and supermodular with respect to other pairs. 


Definition 2.92 (Additive (probability) measure). A fuzzy measure v 
is called additive if for any A,B CN, ANB=90: 


v( AU B) = vo(A) + v(B). (2.68) 
An additive fuzzy measure is called a probability measure. 


Note 2.93. A fuzzy measure is both sub- and supermodular if and only if it is addi- 
tive. A fuzzy measure is both sub- and superadditive if and only if it is additive. 


Note 2.94. For an additive fuzzy measure clearly v(A) = Je 4 v({t}). 


Note 2.95. Additivity implies that for any subset A C M \ {i,j} 
(AU {i, j}) = (AU {i}) + (AU {3}) — ofA). 


Definition 2.96 (Boolean measure). A fuzzy measure v is called a boolean 
fuzzy measure or {0,1}-measure if it holds: 


v(A) =0 or v(A) = 1, 
for all ACN. 


Definition 2.97 (Balanced measure). A fuzzy measure v is called balanced 
if it holds: 
| A |<| B |= > v(A) < v(B), for all A,B CN. 


Definition 2.98 (Symmetric fuzzy measure). A fuzzy measure v is called 
symmetric if the value v( A) depends only on the cardinality of the set A, i.e., 
for any A,B CON, 

if |A| = |B] then v( A) = v(B). 


Alternatively, one can say that a fuzzy measure v is symmetric if for any 
ACW it is 
A 
a= (£), (2.69) 


n 


for some monotone non-decreasing function Q : [0,1] — [0,1], Q(0) = 0 and 


Q0) =1. 
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Example 2.99. The following fuzzy measure is additive 


1 
0.4 0.7 0.9 
0.1 0.3 0.6 
0 


The following fuzzy measure is symmetric 


1 
0.7 0.7 0.7 
0.2 0.2 0.2 
0 


The following fuzzy measure is superadditive but not submodular 


1 
0.6 0.5 0.6 
0.3 0.1 0.2 
0 


The following fuzzy measure is subadditive and symmetric 


1 
0.5 0.5 0.5 
0.5 0.5 0.5 
0 


Example 2.100. Let v be {0,1}-fuzzy measure on VV = {1, 2,3} 
1 

1 1 0 

0 0 0 
0 


This measure is superadditive but its dual fuzzy measure v*, given by 


= m. 
SCOFRFR 
Or 


is not subadditive, because, for instance, v*({2,3}) = 1 and v*({2}) + 
u*({3}) = 0, nor is it superadditive, because v*({1,2,3}) < v*({1}) + 
u* ({2, 3}). 
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Definition 2.101 (Possibility and necessity measures). A fuzzy mea- 
sure is called a possibility, Pos, if for all A,B CN it satisfies 


Pos(AU B) = max{ Pos( A), Pos(B)}. 
A fuzzy measure is called a necessity, Nec, if for all A,B CN it satisfies 
Nec(AN B) = min{ Nec(A), Nec(B)}. 


Note 2.102. Possibility and necessity measures are dual to each other in the sense 
of Definition [2.85] that is, for all A C M 


Nec(A) = 1 — Pos(A°). 


A possibility measure is subadditive. A necessity measure is superadditive. 


Possibility and necessity measures are the basis of the theory of possibility 
bsd 


B3 [25 (284). 


Example 2.103. The following fuzzy measure v is a possibility measure 


1 
1 03 1 
0.3 1 02 
0 


Example 2.104. The following fuzzy measure v is a necessity measure, dual to 
the possibility measure in the previous example 


1 
0.8 O 0.7 
0 0.7 0 
0 


Definition 2.105 (Belief Measure). A belief measure Bel : 2" — [0,1] is 
a fuzzy measure that satisfies the following condition: for all m > 1 


m 


Bel(LJA)> So DPH Bef) Ai), 


i=l OATC{Il,...,m} icl 
where {Ai }ic{1,... m}; 18 any finite family of subsets of N. 2 


21 For a fixed m > 1 this condition is called m-monotonicity (simple monotonicit 
for m = 1), and if it holds for all m > 1, it is called total monotonicity iza, 
For a fixed m, condition in Definition P.I06]is called m-alternating monotonicity. 
2-monotone fuzzy measures are called supermodular (see Definition 2.87), also 
called convex, whereas 2-alternating fuzzy measures are called submodular. If a 
fuzzy measure is m-monotone, its dual is m-alternating and vice versa. 
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Definition 2.106 (Plausibility measure). A plausibility measure Pl : 
ON oy [0,1] is a fuzzy measure that satisfies the following condition: for all 


m>tl 
PUL) AS SS CaM PL A), 


i=l OATC{Il,...,m} icl 


where {Aj}iefi,....m} 18 any finite family of subsets of N. 


Belief and plausibility measures constitute the basis of Dempster and 
Shafer Evidence Theory 29). Belief measures are related to (and sometimes 
defined through) basic probability assignments, which are the values of the 
Mobius transformation. We refer the reader to the literature in this field, e.g., 


lud Bsd. 


Note 2.107. A set function PI : 2% —> [0, 1] is a plausibility measure if its dual set 
function is a belief measure, i.e., for all A C M 


PI(A) = 1 — Bel(A‘). 
Any belief measure is superadditive. Any plausibility measure is subadditive. 


Note 2.108. A fuzzy measure is both a belief and a plausibility measure if and only 
if it is additive. 


Note 2.109. A possibility measure is a plausibility measure and a necessity measure 
is a belief measure. 


Note 2.110. The set of all fuzzy measures (for a fixed M) is conve PA The sets of 
subadditive, superadditive, submodular, supermodular, subadditive, superadditive, 
additive, belief and plausibility fuzzy measures are convex. However the sets of 
possibility and necessity measures are not convex. 


A-fuzzy measures 


Additive and symmetric fuzzy measures are two examples of very simple fuzzy 
measures, whereas general fuzzy measures are sometimes too complicated for 
applications. Next we examine some fuzzy measures with intermediate com- 
plexity, which are powerful enough to express interactions among the variables, 
yet require much less than 2” parameters to express them. 

As a way of reducing the complexity of a fuzzy measure Sugeno 
introduced \-fuzzy measures (also called Sugeno measures). 


Definition 2.111 (\-fuzzy measure). Given a parameter À €] — 1, «|, a 
A-fuzzy measure is a fuzzy measure v that for all A,B C N, ANB = 0 satisfies 


v(AU B) = v(A) + v(B) + Av(A)v(B). (2.70) 


2 A set E is convex if ax + (1 — a)y € E for all x,y € E,a € (0, 1]. 
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Under these conditions, all the values v( A) are immediately computed 


from n independent values u({i}),i = 1,...,n, by using the explicit formula 
Pie 1f : 
Ü=; (fewa), azo 
i=1 i=1 


If à = 0, A-fuzzy measure becomes a probability measure. The coefficient A is 
determined from the boundary condition v(M) = 1, which gives 


n 


A+1= [ [0+ rv({a})), (2.71) 


i=l 


which can be solved on (—1,0) or (0,00) numerically (note that A = 0 is 
always a solution). Thus a A-fuzzy measure is characterized by n independent 
values v({t}),¢=1,...,n. 

A A-fuzzy measure v is related to a probability measure P through the 


relation 
_ log(1 + Av(A)) 
E 1+. 
and, using g(t) = ((1 + A)! —1)/A for A > —1, à 40, and g(t) = t for \=0, 


g(P(A)) = ofA). 


P(A) 


Note 2.112. The set of all A-fuzzy measures is not convex. 
A \-fuzzy measure is an example of a distorted probability measure. 


Definition 2.113 (Distorted probability measure). A fuzzy measure v 
is a distorted probability measure if there exists some non-decreasing function 
g : [0,1] — [0,1], g(0) = 0, g(1) = 1, and a probability measure P, such that 
for all ACN: 

oA) = g(P(A)). 


We remind that Weighted OWA functions (see p. are equivalent to 
Choquet integrals with respect to distorted probabilities. Distorted proba- 
bilities and their extension, m-dimensional distorted probabilities, have been 
recently studied in [193}. 

A A-fuzzy measure is also an example of a decomposable fuzzy measure 


luj. 


Definition 2.114 (Decomposable fuzzy measure). A decomposable fuzzy 
measure v is a fuzzy measure which for all A,B CN, ANB =ù satisfies 


v(A UB) = f(v(A), v(B)) (2.72) 


for some function f : [0,1]? — [0,1] known as the decomposition function. 
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Note 2.115. It turns out that to get v( MN) = 1, f must necessarily be a t-conorm 
(see Chapter[3). In the case of \-fuzzy measures, f is an Archimedean t-conorm with 


mi) # 0, which is a Sugeno-Weber t-conorm, 


an additive generator h(t) = OTN > 


see p. [162] 





Note 2.116. Additive measures are decomposable with respect to the Lukasiewicz t— 
conorm Sz(z, y) = min(1, x+y). But not every S,-decomposable fuzzy measure is a 
probability. Possibility measures are decomposable with respect to the maximum t- 
conorm Smaz(z,y) = max(z,y). Every Smax-decomposable discrete fuzzy measure 
is a possibility measure. 


Note 2.117. A \-fuzzy measure is either sub- or supermodular, when —1 < A < 0 or 
A > 0 respectively. 


Note 2.118. When —1 < à < 0, a A-fuzzy measure is a plausibility measure, and 
when à > 0 it is a belief measure. 


Note 2.119. Dirac measures (Example[2.77) are \-fuzzy measures for all A €]—1, oof. 


Note 2.120. For a given t-conorm § and fixed M, the set of all S-decomposable 
fuzzy measures is not always convex. 


k - additive fuzzy measures 


Another way to reduce complexity of aggregation functions based on fuzzy 
measures is to impose various linear constraints on their values. Such con- 
straints acquire an interesting interpretation in terms of interaction indices 
discussed in the next section. One type of constraints leads to k-additive 
fuzzy measures. 


Definition 2.121 (k-additive fuzzy measure). A fuzzy measure v is 
called k-additive (1 < k < n) if its Mobius transformation verifies 


M(A) =0 


for any subset A with more than k elements, |A| > k, and there exists a subset 
B with k elements such that M(B) £ 0. 


An alternative definition of k-additivity (which is also applicable to fuzzy 
measures on more general sets than M) was given by Mesiar in usg. It 
involves a weakly monotond?}] additive set function v; defined on subsets of 
N®=NxNx...xN. A fuzzy measure v is k-additive if v( A) = vg(A*) for 
all ACN. 


23 Weakly monotone means V.A,B C N, A C B implies vg(A*) < vk (B®). 
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2.6.4 Interaction, importance and other indices 


When dealing with multiple criteria, it is often the case that these are not 
independent, and there is some interaction (positive or negative) among the 
criteria. For instance, two or more criteria may point essentially to the same 
concept, for example criteria such as “learnability” and “memorability” that 
are used to evaluate software user interface [226]. If the criteria are combined 
by using, e.g., weighted means, their scores will be double counted. In other 
instances, contribution of one criterion to the total score by itself may be 
small, but sharply rise when taken in conjunction with other criteria (i.e., in 
a “coalition” Pl 

Thus to measure such concepts as the importance of a criterion and in- 
teraction among the criteria, we need to account for contribution of these 
criteria in various coalitions. To do this we will use the concepts of Shapley 
value, which measures the importance of a criterion i in all possible coalitions, 
and the interaction index, which measures the interaction of a pair of criteria 


i, j in all possible coalitions 107 los). 


Definition 2.122 (Shapley value). Letv be a fuzzy measure. The Shapley 
index for every i EN is 


2 n — |A| — 1)!.A]! 
a) = E EADM AU ti) — A]. 
AGN \ {i} 
The Shapley value is the vector (v) = (¢(1),..., d(n)). 
Note 2.123. It is informative to write the Shapley index as 


yY—— E pauh- 


(i) = 
n—1 5 
t=0 ( i ) ACN \ {i}, | Al=t 


n—1 
1 
n 


Note 2.124. For an additive fuzzy measure we have ¢(i) = v({i}). 


The Shapley value is interpreted as a kind of average value of the contri- 
bution of each criterion alone in all coalitions. 


Definition 2.125 (Interaction index). Let v be a fuzzy measure. The in- 
teraction index for every pair i,j EN is 


n= D LE Aut -uA i (AULD) HOA] 
AGN \ {ij} 


24 Such interactions are well known in game theory. For example, contributions of 
the efforts of workers in a group can be greater or smaller than the sum of their 
separate contributions (if working independently). 
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The interaction indices verify [;; < 0 as soon as i,j are positively corre- 
lated (negative synergy, redundancy). Similarly [;; > 0 for negatively corre- 
lated criteria (positive synergy, complementarity). [;; € [-1,1] for any pair 
i,j. 

Proposition 2.126. For a submodular fuzzy measure v, all interaction in- 


dices verify lj <0. For a supermodular fuzzy measure, all interaction indices 
verify iy > 0. 


The Definition |[2.125|due to Murofushi and Soneda was extended by Gra- 
bisch for any coalition A of the criteria (not just pairs) [107]. 


Definition 2.127 (Interaction index for coalitions). Let v be a fuzzy 
measure. The interaction index for every set ACN is 


= ! 
Hon A (n= |A| +1)! A 


Note 2.128. Clearly I(A) coincides with I;; if A = {i, j}, and coincides with (7) if 
A = {i}. Also I(A) satisfies the dummy criterion axiom: If 7 is a dummy criterion, 
i.e., v(B U {i}) = v(B) + v({i}) for any B C N \ {i}, then for every such B F 9, 
I(B U {i}) = 0. A dummy criterion does not interact with other criteria in any 
coalition. 


Note 2.129. An alternative single-sum expression for I( A) was obtained in (115): 


B (154) o 
A= 2 (n= JA +1) (7341) (8). 


BON |B\A| 


An alternative to the Shapley value is the Banzhaf index o). It measures 
the same concept as the Shapley index, but weights the terms [v( AU {i}) — 
v(A)] in the sum equally. 


Definition 2.130 (Banzhaf Index). Let v be a fuzzy measure. The Banzhaf 
index b; for every i EN is 


w= Ð WAUH- oA. 


ACN \ {i} 





This definition has been generalized by Roubens in (211). 


Definition 2.131 (Banzhaf interaction index for coalitions). Letv be a 
fuzzy measure. The Banzhaf interaction index between the elements of AC N 
is given by 


WA) = sq S He-p4Mlwwue). 


BON\ACCA 
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Note 2.132. An alternative single-sum expression for J(A) was obtained in (115): 


1) Aly 
2! 


Mobius transformation ho me to express the indices mentioned above in 
a more compact form |1 [1071 [108 we zi namely 


= 255 Bue 


B| ee 
1 
(A= X, BM), 
sade BI 4] +1 
1 
A= >` aaa (8). 
B|ACB 


Example 2.183. Let v be the fuzzy measure defined as follows: 


1 
0.5 0.6 0.7 
0.1 0.2 0.3 
0 


Then the Shapley indices are ¢(1) = 0.7/3, ¢(2) = 1/3, 6(3) = 1.3/3. 


The next result due to Grabisch establishes a fundamental prop- 
erty of k-additive fuzzy measures, which justifies their use in simplifying in- 
teractions between the criteria in multiple criteria decision making. 


Proposition 2.134. Let v be a k-additive fuzzy measure, 1 < k < n. Then 


e I(A) =0 for every ACN such that |A| > k; 
e I(A) =J(A) = M(A) for every ACN such that |A| = k. 


Thus k-additive measures acquire an interesting interpretation. These are 
fuzzy measures that limit interaction among the criteria to groups of size 
at most k. For instance, for 2-additive fuzzy measures, there are pairwise 
interactions among the criteria but no interactions in groups of 3 or more. By 
limiting the class of fuzzy measures to k-additive measures, one reduces their 
complexity (the number of values) by imposing linear equality constraints. 
The total number of linearly independent values is reduced from 2” — 1 to 


Tai @) —i. 
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Orness value 


We recall the definition of the measure of orness of an aggregation function 
on p. 40] By using the Mobius transform one can calculate the orness of a 
Choquet integral Cy with respect to a fuzzy measure v as follows. 


Proposition 2.135 (Orness of Choquet integral). Heij For any fuzzy 
measure v the orness of the Choquet integral with respect to v is 








2 n—|A| 
orness(Cy) = | >a A+ MA), 


where M(A) is the Möbius representation of A. In terms of v the orness value 
is 


1 (n= [ADIA]! 
orness(Cy) = aa pa — M. 


Another (simplified) criterion which measures positive interaction among 
pairs of criteria is based on the degree of substitutivity B3. 


Definition 2.136 (Substitutive criteria). Let i,j be two criteria, and let 
Vij € [0,1] be the degrees of substitutivity. The fuzzy measure v is called sub- 
stitutive with respect to the criteria i, j if for any subset ACN \ {i,j} 


(AU {i, j}) 
v(AU {i, j}) 
Note 2.137. When vij = 1, and in view of the monotonicity of fuzzy measures, we 


obtain the equalities v( AU {i, j}) = v(AU {i}) = v(AU {3}), i.e., fully substitutive 
(identical) criteria. One of these criteria can be seen as dummy. 


v(AU {i}) + (1 — Hj) o(AU {5}), 


< 
< (AU {7}) + (1 — vig)o(A U {i}). (2.73) 


Note 2.138. If vij = 0, i.e., the criteria are not positively substitutive, then v(AU 
{i, j}) < v(AU {i}) + o(AU {4}), which is a weaker version of (2.66). It does not 
imply independence, as the criteria may have negative interaction. Also note that 
it does not imply subadditivity, as v(A) > 0 , and it only applies to one particular 
pair of criteria, i,7, not to all pairs. On the other hand subadditivity implies 
for all pairs of criteria, and with some vij > 0, i.e., all the criteria are substitutive 
to some degree. 


Entropy 


The issue of the entropy of Choquet integrals was treated in (147, [166]. 


25 See discussion in md, p.318. 
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Definition 2.139. The entropy of a fuzzy measure v is 
j j 
-5 E EA A oA i) - A), 
iEN ACN\{i} 
with h(t) = —tlogt, if t >0 and h(0) = 0. 





This definition coincides with the definition of weights dispersion (Defini- 
tion 2.37) used for weighted arithmetic mean and OWA functions, when v is 
additive or symmetric (see Proposition[2.143] below). The maximal value of H 
is log n and is achieved if and only if v is an additive symmetric fuzzy measure 
, ie., v(A) = lAl for all A C M. The minimal value 0 is achieved if and only 
if v is a Boolean fuzzy measure. Also H is a strictly concave function of v, 
which is useful when maximizing H over a convex subset of fuzzy measures, 
as it leads to a unique global maximum. 


2.6.5 Special cases of the Choquet integral 


Let us now study special cases of Choquet integral with respect to fuzzy 
measures with specific properties. 


Proposition 2.140. If v* is a fuzzy measure dual to a fuzzy measure v, the 
Choquet integrals Cy and Cu» are dual to each other. If v is self-dual, then Cy 
is a self-dual aggregation function. 


Proposition 2.141. The Choquet integral with respect to an additive fuzzy 
measure v is the weighted arithmetic mean Mw with the weights wi = v({i}). 


Proposition 2.142. The Choquet integral with respect to a symmetric fuzzy 
measure v defined by means of a quantifier Q as in (2.69) is the OWA function 
OW Aw with the weights 2] w; = Q(4) - Q(=). 


The values of the fuzzy measure v, associated with an OWA function with 
the weighting vector w, are also expressed as 


n 


v(A) = 5 wi. 


i=n—|A|+1 
If a fuzzy measure is symmetric and additive at the same time, we have 


26 We remind that in the definition OWA, we used a non-increasing permutation of 
the components of x, x\,, whereas in Choquet integral we use a non-decreasing 
permutation w >. Then OWA is expressed as 


aw = Eero (E= - Q(45) = - no (d-e). 


i=1 





We also remind that Q : [0, 1] pr 1], Q(0) = 0,Q(1) = 1 is a RIM quantifier 


Q(0) 
which determines values of v as v( A) = 9 (is). 
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Proposition 2.143. The Choquet integral with respect to a symmetric addi- 
tive fuzzy measure is the arithmetic mean M, and the values of v are given 


by 


Proposition 2.144. The Choquet integral with respect to a boolean fuzzy mea- 
sure v coincides with the Sugeno integral (see Section [2.7] below). 


2- and 3-additive symmetric fuzzy measures 


These are two special cases of symmetric fuzzy measures, for which we can 
write down explicit formulas for determination of the values of the fuzzy mea- 
sure. By Proposition 2.142] Choquet integral with respect to any symmetric 
fuzzy measure is an OWA function with the weights w; = Q(4+) — Q(++). If 
v is additive, then Q(t) = t, and the Choquet integral becomes the arithmetic 
mean M. 

Let us now determine function Q for less restrictive 2- and 3-additive 
fuzzy measures. These fuzzy measures allow interactions of pairs and triples 
of criteria, but not in bigger coalitions. 

It turns out that in these two cases, the function @ is necessarily quadratic 
or cubic, as given in Proposition22.72] p. [86] namely, for 2-additive symmetric 
fuzzy measure 


Q(t) = at? + (1 — a)t for some a € [-1, 1], 


which implies that OWA weights are equidistant (i.e., wi41 — wi = const for 
alli = 1,...,n — 1). In the 3-additive case we have 


Q(t) = at? + bt? + (1 — a — b)t for some a € [—2, 4], 


such that 
e ifa € |-2,1] then b € |[-2a — 1,1 — al; 


| 
e ifa c€]1,4] then b E€ [—3a/2 — y 3a(4 — a)/4, —3a/2 + \/3a(4 — a)/4]. 





2.6.6 Fitting fuzzy measures 


Identification of the 2” — 2 values from the data (two are given explicitly as 
v(0) = 0,v(N) = 1) involves the least squares or least absolute deviation 
problems 


K 
minimize 5 (Cy(@1k,---,2nk) = YR) , OT 
k=1 


K 
minimize > [Cu (Tik; -< -, Enk) — Ykl, 
k=1 
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subject to the conditions of monotonicity of the fuzzy measure (they translate 
into a number of linear constraints). If we use the indexing system outlined 
on p. the conditions of monotonicity v(T) < v(S) whenever T C S can 
be written as v; < v; if i < j and AND(i,j) =i (AND is the usual bitwise 
operation, applied to the binary representations of i, 7, which sets k-th bit of 
the result to 1 if and only if the k-th bits of i and j are 1). 

Thus, in the least squares case we have the optimization problem 


K 
minimize S (< g(£ik,.--, Enk) V > Yk}, (2.74) 
k=1 


s.t. vj —v; > 0, for all i < j such that AN D(i,7) =i, 
Uj 20,7 =1,...,2” — 2, von_1 = 1, 


which is clearly a QP problem. In the least absolute deviation case we obtain 


K 
minimize DO |< g(tin,---,2nk),V > —Ynl, (2.75) 
k=1 


s.t. vj —v; > 0, for all i < j such that AND(i,7) =i, 
vj > 0,f =1,...,2% —2, von_1 = 1, 


which is converted into an LP problem (see in Appendix [A.2] how LAD is 
converted into LP). These are two standard problem formulations that are 
solved by standard QP and LP methods. 

Note that in formulations (2.74) and (2.75) most monotonicity constraints 
will be redundant. It is sufficient to include only constraints such that 
AND(i,j) = i, i and j differ by only one bit (i.e., the cardinalities of the 
corresponding subsets satisfy |7| — |Z| = 1). There will be n(2"~1+ — 1) non- 
redundant constraints. Explicit expression of the constraints in matrix form is 
complicated, but they are easily specified by using an incremental algorithm 
for each n. Further, many non-negativity constraints will be redundant as 
well (only v({i}) > 0,i = 1,...,n are needed), but since they form part of a 
standard LP problem formulation anyway, we will keep them. 

However, the main difficulty in these problems is the large number of 
unknowns, and typically a much smaller number of data tid sg). While 
modern LP and QP methods handle well the resulting (degenerate) problems, 
for n > 10 one needs to take into account the sparse structure of the system 
of constraints. For larger n (e.g., n = 15, 2” — 1 = 32767) QP methods are not 
as robust as LP, which can handle millions of variables. It is also important to 
understand that if the number of data K < 2”, there will be multiple optimal 
solutions, i.e., (infinitely) many fuzzy measures that fit the data. 

As discussed in multiple solutions sometimes lead to counterintuitive 
results, because many values of v will be near 0 or 1. It was proposed to use a 
heuristic, which, in the absence of any data, chooses the “most additive” fuzzy 
measure, i.e., converts Choquet integral into the arithmetic mean. Grabisch 
Md, has developed a heuristic least mean squares algorithm. 
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It is not difficult to incorporate the above mentioned heuristic into QP 
or LP problems like and (2.75). Firstly, one may use the variables 
0; = vj — Uj, instead of vj (0; denote the values of the additive symmetric 
fuzzy measure u(A) = lAN, In this case the default 0 values of the fitted 0; (in 
the absence of data) will result in v; = vj. On the other hand, it is possible to 
introduce penalty terms into the objective functions in (2.74) and (2.75) by 
means of artificial data, e.g., to replace the objective function in (2.74) with 


Yis 


K PAi 
XO (< g (aie; ..-,2ne),V¥ > yk) + = YO (< glk,- ank), V > —be)”, 
k=1 k=1 


with p being a small penalty parameter, ay, being the vertices of the unit cube 
and bk = D ir tik i= Se, 

There are also alternative heuristic methods for fitting discrete Choquet 
integrals to empirical data. An overview of exact and approximate methods 


is provided in 


Other requirements on fuzzy measures 


There are many other requirements that can be imposed on the fuzzy measure 
from the problem specifications. Some conditions are aimed at reducing the 
complexity of the fitting problem (by reducing the number of parameters), 
whereas others have direct meaningful interpretation. 


Importance and interaction indices 


The interaction indices defined in Section 2.6.4] are all linear functions of 
the values of the fuzzy measure. Conditions involving these functions can be 
expressed as linear equations and inequalities. 

One can specify given values of importance (Shapley value) and interaction 
indices (i), Ii; (see pp. by adding linear equality constraints to 
the problems and (2.75). Of course, these values may not be specified 
exactly, but as intervals, say, for Shapley value we may have a; < (i) < bi. 
In this case we obtain a pair of linear inequalities. 


Substitutive criteria 
For substitutive criteria i, j we add (see p. (08) 


v(AU {i, j}) < (AU {i}) + (1 — vig)o(AU {3}), 
v(AU {i, j}) < (AU {5}) + (1 — vijo AU {i}). 
for all subsets A C M \ {i, j}, where vij € [0,1] is the degree of substitutivity. 


These are also linear inequality constraints added to the quadratic or linear 
programming problems. 
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k-additivity 


Recall that Definition [2.121] specifies k-additive fuzzy measures through their 
Mobius transform 


M(A) =0 


for any subset A with more than k elements. Since Möbius transform is a 
linear combination of values of v, we obtain a set of linear equalities. By using 
interaction indices, we can express k-additivity as (see Proposition [2.134) 
I(A) = 0 for every A CN, |A| > k, which is again a set of linear equalities. 


All of the mentioned conditions on the fuzzy measures do not reduce the 
complexity of quadratic or linear programming problems and (2.75). 
They only add a number of equality and inequality constraints to these prob- 
lems. The aim of introducing these conditions is not to simplify the problem, 
but to better fit a fuzzy measure to the problem and data at hand, especially 
when the number of data is small. 

Let us now consider simplifying assumptions, which do reduce the com- 
plexity. First recall that adding the symmetry makes Choquet integral an 
OWA function. In this case we only need to determine n (instead of 2” — 2) 
values. Thus we can use the techniques for fitting OWA weights, discussed in 
Section In the case of 2- and 3-additive symmetric fuzzy measures we 
can fit generating functions Q, as discussed in Section [2.5.5 


A-fuzzy measures 


Fitting \-fuzzy measures also involves a reduced set of parameters. Recall (p. 
[102) that \-fuzzy measures are specified by n parameters v({i}),i=1,...,n. 
The other values are determined from 


“waa i (Tle paot 1) 


icA 
with the help of the parameter A, which itself is computed from 


n 


A+1= [ [0+ wa). 


i=l 


The latter is a non-linear equation, which is solved numerically on (—1, 0) 
or (0, co). This means that the Choquet integral C, becomes a nonlinear func- 
tion of parameters v({i}), and therefore problems and become 
difficult non-linear programming problems. The problem of fitting these fuzzy 
measures was studied in [56, FEE! E! and the methods are 


based on genetic algorithms. 
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Representation based on Mobius transformation 


In this section we reformulate fitting problem for general k-additive fuzzy 
measures based on Mobius representation. Our goal is to use this representa- 
tion to reduce the complexity of problems (2:74) and (2.75). We remind that 
this is an invertible linear transformation such that: 


° M(A) = X` (-1)4\"!0(B) and 
BCA 
v( A) = 5 M(B), foral ACN. 
BCA 


e Choquet integral is expressed as 


Cy(x) = 5 M(A) min 2;. 
ACN i 


e k-additivity holds when 
M(A)=0 


for any subset A with more than k elements. 


As the variables we will use m; = m4 = M(A) such that |A| < k in some 
appropriate indexing system, such as the one based on cardinality ordering 
on p. D7] This is a much reduced set of variables ( $>}; (7) — 1 compared to 


i 


2” — 2). Now, monotonicity of a fuzzy measure, expressed as 
v( AU {i}) —v(A) > 0, YVAļli g A,i=1,...,N, 
converts into (2.54), and using k-additivity, into 
5 mg >0, forall ACN and alli € A. 
BCAļiEB,|B|<k 


The (non-redundant) set of non-negativity constraints v({i}) > 0,i = 1,...,n, 
is a special case of the previous formula when A is a singleton, which simply 
become (see Note 2.81) 


mg = My} 20, 7=1,...,n. 
B={i} 


Finally, condition v(M) = 1 becomes S>  mg=l1. 
BCN||B|<k 
Then the problem (2.74) is translated into a simplified QP problem 
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2 


K 
minimize 5 5 ha(x;ma -y| , (2.76) 


s.t. 5 mg = 0, 


BC A|i€B,|B|<k 
for all AC N,|A| > 1, and alli € A, 
May > 0, ol ree 
2, mesi, 
BCN ||B|<k 
where ha(x) = min zi. Note that only the specified mg are non-negative, 
ve 


others are unrestricted. The number of monotonicity constraints is the same 
for all k-additive fuzzy measures for k = 2,...,n. Similarly, a simplified LP 
problem is obtained from (2.75), with the same set of constraints as in (2:76). 

A software package Kappalab provides a number of tools to calculate vari- 
ous quantities using fuzzy measures, such as the Mobius transform, interaction 
indices, k—additive fuzzy measures in various representations, and also allows 
one to fit values of fuzzy measures to empirical data. This package is available 
from http: //www.polytech.univ-nantes.fr/kappalab and it works under 
R environment [145]. A set of C++ algorithms for the same purpose is avail- 
able from http: //www.deakin.edu.au/~gleb/aotool.html. 


2.6.7 Generalized Choquet integral 


Mesiar has proposed a generalization of the Choquet integral, called 
Choquet-like integral. 


Definition 2.145 (Generalized discrete Choquet integral). Let g : 
(0, 1] — [—co, oo} be a continuous strictly monotone function. A generalized 
Choquet integral with respect to a fuzzy measure v is the function 


Cv,g(X) = 9~* (Co(g(x))) ; 


where Cy is the discrete Choquet integral with respect to v and g(x) = 
(g(v1),--+,9(@n)). 


A special case of this construction was presented in i 


n 1/q 
Cv q(x) = (>: i.) [v( Hi) — vat » GER (2.77) 


It is not difficult to see that this is equivalent to 


m 1/q 
Coax) = (>: lets — a xa) . (2.78) 


i=l 
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The generalized Choquet integral depends on the properties of the fuzzy 
measure v, discussed in this Chapter. For additive fuzzy measures it becomes 
a weighted quasi-arithmetic mean with the generating function g, and for 
symmetric fuzzy measures, it becomes a generalized OWA function, with the 
generating function g. Continuity of C,,, holds if Ran(g) 4 [—co, ov]. 

Fitting generalized Choquet integral to empirical data involves a modifi- 
cation of problems (2274), (2:75) or (2.76), which consists in applying g to 
the components of xz, and yx, (i-e., using the data {g(xx), 9(yx)}) provided g 
is known, or is fixed. In the case of fitting both g and v to the data, we use 
bi-level optimization, similar to that in Section 2.3.7] 

The Choquet integral, as well as the Sugeno integral treated in the next 
section, are special cases of more general integrals we a to a fuzzy 
measure. The interested reader is referred to [179 ibid 


2.7 Sugeno Integral 


2.7.1 Definition and properties 


Similarly to the Choquet integral, Sugeno integral is also frequently used to 
aggregate inputs, such as preferences in multicriteria decision making [sod {164}. 
Various important classes of aggregation functions, such as medians, weighted 
minimum and weighted maximum are special cases of Sugeno integral. 


Definition 2.146 (Discrete Sugeno integral). The Sugeno integral with 
respect to a fuzzy measure v is given by 


Sy(x) = max _min{x(,¥ v( H;)}, (2.79) 


ele 


where X 7 = (X(1),£(2),+--,%(m)) is a non-decreasing permutation of the input 
x, and H; = {(i),...,(n)}. 


Sugeno integrals can be expressed, for arbitrary fuzzy measures, by means 
of the Median function (see Section below) in the following way: 


Sy(x) = Med(a1,...,%n,v(H2), v(H3),...,0(Hn)). 
Let us denote max by V and min by A for compactness. We denote by 


x Vy =z the componentwise maximum of x,y (i.e., 2; = max(2;,y;)), and 
by x A y their componentwise minimum. 


Main properties 


e Sugeno integral is a continuous idempotent aggregation function; 
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An aggregation function is a Sugeno integral if and only if it is min- 
homogeneous, i.e., Su(£1 A T,...,£n AT) = Sy(a1,.--,%n) Ar and maz- 
homogeneous, i.e., Sy(£ı VT,...,£n Vr) = Sy(a1,...,%n) Vr for all 
x € [0,1]”,r € [0,1] (See 163}, Th. 4.3. There are also alternative charac- 
terizations); 

Sugeno integral is comonotone maxitive and comonotone minimitive, i.e., 
Sy(XV y) = S(x) V S (y) and S,(xA y) = Sy(x) A Sy(y) for all comono- 
tond24 x,y € (0, 1)”; 

Sugeno integral is Lipschitz-continuous, with the Lipschitz constant 1 in 
any p-norm, which means it is a kernel aggregation function, see Definition 
[1.62] p. [23} 


The class of Sugeno integrals is closed under duality. 


Calculation 


Calculation of the discrete Sugeno integral is performed using Equation (2.79) 
similarly to calculating the Choquet integral on p. [96] We take the vector of 


pairs ((x1, 1), (£2,2),..., (£n, n)), where the second component of each pair 
is just the index i of xi. The second component will help keeping track of all 
permutations. 


4. 


Calculation of S,(x). 


. Sort the components of ((x1, 1), (v2,2),...,(@n,n)) with respect to the 


first component of each pair in non-decreasing order. We obtain 
((@(1), 41), (£), 42), coast (Pasta) so that LG) = Xi; and Lj) LS L541) 
for all i. 


. Let T = {1,...,n}, and S=0. 

. Forj =1,...,n do 
a) S := max(S, min(#,;), v(T))); 
b) T =T \ {ij} 
Return S. 


Example 2.147. Let n = 3, let the values of v be given by 


v(O) =0, v(N) = 1, v({1}) = 0.5, v({2}) = v({3}) = 0, 
v({1,2}) = v({2,3}) = 0.5, v({1,3}) = 1. 


and x = (0.8, 0.1, 0.6). 


Step 1. We take ((0.8, 1), (0.1, 2), (0.6,3)). 


Sort this vector of pairs to obtain ((0.1, 2), (0.6, 3), (0.8, 1)). 


Step 2. Take T = {1,2,3} and S=0. 
Step 3. a) S := max(0, min(0.1, v({1, 2,3}))) = 0.1; 


b) T = {1, 2,3} \ {2} = {1,3}; 


27 See footnote [18] on p. 
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a) S := max(0.1, min(0.6, v({1,3}))) = 0.6; 
b)T := {1,3} \ {3} = {1}; 

a) S := max(0.6, min(0.8, v({1}))) = 0.6. 
Therefore S,(x) = 0.6. 


Special cases 


The Sugeno integral with respect to a symmetric fuzzy measure given by 
v( A) = v(| A|) is the Median Med(z1,..., £n, v(n — 1), u(n — 2),..., v(1)). 


Weighted maximum (max-min) WMAX,(x) = max min{w;,7;}. An 


i=l, 

aggregation function is a weighted maximum if and only if it is the Sugeno 

integral with respect to a possibility measure; 

Weighted minimum (min-max) WMINẹw(x) = | min max{w;, xi}. An 
n 


1ST pic 
aggregation function is a weighted minimum if and only if it is the Sugeno 
integral with respect to a necessity measure; 

Ordered weighted maximum OW MAX, (x) = max min{w;, £} witha 

Elis n 

non-increasing weighting vector 1 = wı > wo >... > Wn. An aggregation 
function is an ordered weighted maximum if and only if it is the Sugeno 
integral with respect to a symmetric fuzzy measure. It can be expressed 
by means of the Median function as 


OW MAX w(x) = Med(a1,...,2n,W2,---, Wn); 


Ordered weighted minimum OW MI Ny (x) = | min max{w;,%(;)} witha 


er 


non-increasing weighting vector wı > w2 >... > Wn = 0. An aggregation 
function is an ordered weighted minimum if and only if it is the Sugeno 
integral with respect to a symmetric fuzzy measure. It can be expressed 
by means of the Median function as 


OW MINw(x) = Med(a1,...,%n,W1,---,Wn—1)} 


The Sugeno integral coincides with the Choquet integral if v is a boolean 
fuzzy measure. 


Note 2.148. The weighting vectors in weighted maximum and minimum do not sat- 


isfy 


X wi = 1, but max(w) = 1 and min(w) = 0 respectively. 


Note 2.149. For the weighted maximum WMAXw (v is a possibility measure) it 
holds WMAXw(x Vy) =WMAXw(x) VWMAXw(y) for all vectors x,y € [0, 1)” 
(a stronger property than comonotone maxitivity). For the weighted minimum 
WMINw (v is a necessity measure) it holds WMINw(x A y) = WMINw(x) A 
W MIN.(y) for all x,y € [0, 1]”. 
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Note 2.150. Ordered weighted maximum and minimum functions are related through 
their weights as follows: OWMAXw(x) = OWMIN,(x) if and only if u; = 
wi41,ti=1,...,n—1 h63. 


Example 2.151. Let v be a fuzzy measure on {1,2} defined by v({1}) =a € 
(0, 1] and v({2}) = b € [0,1]. Then 


Tı V (DA x2), if Tı < T2, 
(a ^ zı) V T2, if Tı = v2. 


s=] 


2.8 Medians and order statistics 


2.8.1 Median 


In statistics, the median of a sample is a number dividing the higher half 
of a sample, from the lower half. The median of a finite list of numbers can 
be found by arranging all the numbers in increasing or decreasing order and 
picking the middle one. If the number of inputs is even, one takes the mean 
of the two middle values. 

The median is a type of average which is more representative of a “typical” 
value than the mean. It essentially discards very high and very low values 
(outliers). For example, the median price of houses is often reported in the 
real estate market, because the mean can be influenced by just one or a few 
very expensive houses, and will not represent the cost of a “typical” house in 
the area. 


Definition 2.152 (Median). The median is the function 


i ‘ i 
_ J 3(£æ) +2(e41)), ifn = 2k is even 
Mei = fa ifn = 2k — 1 is odd, 


where xip) is the k-th largest (or smallest) component of x. 


Note 2.153. The median can be conveniently expressed as an OWA function with a 
special weighting vector. For an odd n let wn+ı = 1 and all other w; = 0, and for 
2 


an even n let wz = w241 = $, and all other w; = 0. Then Med(x) = OW Aw(x). 
2 2 


Definition 2.154 (a-Median). Given a value a € [0,1], the a-median is the 


function 
n—1 times 


Med,(x) = Med(a,...,Up, @,...,@). 


Note 2.155. a-medians are also the limiting cases of idempotent nullnorms, see Sec- 
tion They have absorbing element a and are continuous, symmetric and asso- 
ciative (and, hence, bisymmetric). They can be expressed as 
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Fig. 2.25. 3D plots of a-medians Medo.2 and Medo.5. 


max(x), if x € [0,a]”, 
Meda(x) = § min(x), if x € [a,1]”, 
a otherwise. 


An attractive property of the medians is that they are applicable to in- 
puts given on the ordinal scale, i.e., when only the ordering, rather than the 
numerical values matter. For example, one can use medians for aggregation 
of inputs like labels of fuzzy sets, such as very high, high, medium, low and 
very low. 

The concept of the weighted median was treated in detail in (268). 


Definition 2.156 (Weighted median). Let w be a weighting vector, and 
let u denote the vector obtained from w by arranging its components in the 
order induced by the components of the input vector x, such that up = wi if 
zi = Xp) is the k-th largest component of x. The lower weighted median is 
the function 


Medy(x) = 2x), (2.80) 


where k is the index obtained from the condition 


1 i 
X u< 5 and re > (2.81) 


The upper weighted median is the function where k is the index obtained 
from the condition 
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Note 2.157. It is convenient to describe calculation of Medw(x) using the following 
procedure. Take the vector of pairs ((@1, w1), (£2, w2),.--, (En, Wn)) and sort them in 
the order of decreasing x. We obtain ((x(1), u1), (#2), u2), - - - , (Xn), Un)). Calculate 
the index k from the condition (2:81). Return 2(,). 


Note 2.158. The weighted median can be obtained by using penalty-based construc- 
tion outlined on p. [298 


Main properties 
The properties of the weighted median are consistent with averaging functions: 


Weighted median is a continuous idempotent aggregation function; 
If all the weights w; = i, weighted median becomes the ordinary median 
Med; 

e If any weight w; = 0, then 


Medy(x) = MCU pie: das a Giese. snes PIs +++) Uj-1,Vij4+1,--. bn) 


i.e., the input x; can be dropped from the aggregation procedure; 

e If any input value is repeated, one can use just a copy of this value and 
add the corresponding weights, namely if x; = x; for some i < j, then 
Medw(x) = Medy(a1,...,%j-1, Ej+1;- -< En), 

where w = (wi, ee Wi-1, Wi + Wj, Wit, +--+) Wj-1; Wj41;--- Wn). 

As far as learning the weights of weighted medians from empirical data, 
Yager [26s] presented a gradient based local optimization algorithm. Given 
that such a method does not guarantee the globally optimal solution, it is 
advisable to combine it with a generic global optimization scheme, such as 
multistart local search or simulated annealing. 

Based on the weighted median, Yager also defined an ordinal OWA 
function, using the following construction. We recall (see Section [2.5) that 
OW Ayw(x) =< w,xX\, >, i.e., the weighted mean of the vector xx. By replac- 
ing the weighted mean with weighted median we obtain 


Definition 2.159 (Ordinal OWA). The ordinal OWA function is 
OOW Aw (x) = Medw(x,). 


Note 2.160. Since the components of the argument of the weighted median in Defi- 
nition are already ordered, calculation of the ordinal OWA is reduced to the 
formula 

OOW Aw(x) => T(k); 


where k is the index obtained from the condition 


k—1 1 k 
2 ws < 3 and 2 ws > 
j=1 j=l 


NI = 


A more general class of aggregation functions on an ordinal scale is that 
of weighted ordinal means, presented in iag. 
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2.8.2 Order statistics 


Definition 2.161 (Order statistic). The k-th order statistic is the function 
kOS(x) = Tk)> 
i.e., its value the k-th smallest2)| component of x. 


Note 2.162. The order statistics can be conveniently expressed as OWA functions 
with special weighting vectors. Let w = (0,0,...,0,1,0...,0), i.e., ws = 0 for 
i #n-—k+1 and wn-k+1 = 1. Then kOS(x) = OW Aw(x). 
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Conjunctive and Disjunctive Functions 


3.1 Semantics 


As their names imply, conjunctive aggregation functions model conjunction 
(i.e., the logical AND), and disjunctive aggregation functions model disjunc- 


tion (logical OR). Consider the situation where various fuzzy criteria are ag- 
gregated as in 


If ty 1s Aj AND to 1s Ao AND ... tin, 1s An THEN ... 


Conjunction does not allow for compensation: low scores for some criteria 
cannot be compensated by other scores. If to obtain a driving license one has 
to pass both theory and driving tests, no matter how well one does in the 
theory test, it does not compensate for failing the driving test. 

Thus it is the smallest value of any of the inputs which bounds the output 
value (from above). Hence we have the definition 


Definition 3.1 (Conjunctive aggregation). An aggregation function f 
has conjunctive behavior (or is conjunctive) if for every x it is bounded by 


f(x) < min(x) = min(a#1,2%2,...,2n). 


For disjunctive aggregation functions we have exactly the opposite. Sat- 
isfaction of any of the criteria is enough by itself, although more than one 
positive input pushes the total up. For example, both a wide open door (when 
you come home) and the sound of the alarm are indicators of a burglary, and 
either one is sufficient to raise suspicion. We may formalize this in a logical 
statement “If the door is open OR the alarm sounds, THEN it may be a bur- 
glary”. But if both happen at the same time, they reinforce each other, and 
our suspicion is stronger than the suspicion caused by any of these indicators 
by itself (i.e., greater than their maximum). We define disjunctive aggregation 
as 
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Definition 3.2 (Disjunctive aggregation). An aggregation function f has 
disjunctive behavior (or is disjunctive) if for every x it is bounded by 


f(x) > max(x) = max(21,%2,...,%n). 


3.2 Duality 


Recall from Chapter [that given a strong negation N (Definition [.48), any 
aggregation function f has an N-dual aggregation function fa associated to it 
(Definition [.54), and that conjunctive and disjunctive aggregation functions 
are dual to each other: 


Proposition 3.3. If f is a conjunctive aggregation function, its N-dual fa 
(with respect to any strong negation N) is a disjunctive aggregation function, 
and vice versa. 


Typically the standard negation N(t) = 1—t is used, and then fa is called 
simply dual of f, although one is not restricted to this choice. 


Example 3.4. The dual of the minimum is maximum. The dual of the product 
Tp is the dual product Sp (also called probabilistic sum) 


n n 
Tpx) = | [v Sp(x)=1-][G-2). 
i=1 i=1 
Duality is very convenient, as it allows one to study only conjunctive aggre- 
gation functions, and then obtain the analogous results for disjunctive func- 
tions by duality. 


Note 3.5. Some aggregation functions are self-dual, i.e., f(x) = N(f(N(x))). We 
study them in detail in Chapter [4| However, there are no self-dual conjunctive or 
disjunctive aggregation functions. 


3.3 Generalized OR and AND — functions 


It follows from their definitions that 0 is the absorbing element of a con- 
junctive aggregation function, and 1 is the absorbing element of a disjunctive 
aggregation function. 

Conjunctive and disjunctive aggregation may or may not be symmetric. A 
neutral element may or may not exist. If it exists, then e = 1 for conjunctive 
aggregation and e = 0 for disjunctive. But if the neutral element is e = 1, then 
the aggregation function is necessarily conjunctive. Similarly, if the neutral 
element is e = 0, then the aggregation function is necessarily disjunctive. The 
existence of a neutral element seems quite logical, and is postulated in many 
studies. 
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Definition 3.6 (Semicopula). An aggregation function f is called a semi- 
copula, if it has the neutral element e = 1. Its dual (an aggregation function 
with the neutral element e = 0) is called dual semicopula. 


Evidently, semicopulas are conjunctive, and their duals are disjunctive. 

The prototypical conjunctive and disjunctive aggregation functions are the 
minimum and maximum, they are the limiting cases, and form the boundary 
with the averaging functions. 


Example 3.7. Aggregation function f(x1, £2) = 2122 is conjunctive, symmet- 
ric, and has neutral element e = 1. It is a semi-copula. Aggregation function 
f(£1, £2) = x?x3 is also conjunctive and symmetric, but it has no neutral 
element. The function f(21,22) = 71x is conjunctive, asymmetric, and has 
no neutral element. 


Consider now the Lipschitz property (see Definition [L.58), and let us con- 
centrate on 1-Lipschitz aggregation functions, Definition L61] 


Definition 3.8 (Quasi-copula). An aggregation function f is called a 
quasi-copula, if it is 1-Lipschitz, and has neutral element e = 1. 


Evidently, each quasi-copula is a semicopula, and hence conjunctive, but 
not the other way around. 


e Conjunctive aggregation functions, semicopulas and quasi-copulas form 
convex classes. That is, any convex combination of two aggregation func- 
tions fı, f2 from one class also belongs to that class 


f= lafi + l—a)fo] € F, if fi, f2 € F and a € [0,1]. 


e A pointwise minimum or maximum of two conjunctive aggregation func- 
tions, semicopulas or quasi-copulas also belongs to the same class. 


Another useful property is monotonicity with respect to argument cardi- 
nality. 


Definition 3.9 (Monotonicity with respect to argument cardinality). 
An extended aggregation function F is monotone non-increasing with respect 
to argument cardinality, if 


Faltis: ain) > Fnti(@1,-- Sty Ones) 


for all n > 1 and any 21,...,%n41 € [0,1]. F is monotone non-decreasing 
with respect to argument cardinality if the sign of the inequality is reversed. If 
F is symmetric, then the positions of the inputs do not matter. 


1 Pointwise minimum of two functions f,g means min(f,g)(x) = min(f(x), g(x)) 
for all x in their (common) domain. Pointwise maximum is defined similarly. 
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Of course, in general members of the family F need not be related, and 
thus extended conjunctive functions do not necessarily verify the condition of 
Definition but it is reasonable to expect that by adding new inputs to 
an extended conjunctive aggregation function, the output can only become 
smaller. In the same way, it is reasonable to expect that extended disjunctive 
aggregation functions are monotone non-decreasing with respect to argument 
cardinality, i.e., adding inputs can only reinforce the output pl 


3.4 Triangular norms and conorms 


Two special and well-known classes of conjunctive and disjunctive aggregation 
functions are the triangular norms and conorms|}| Triangular norms were orig- 
inally introduced by Menger as operations for the fusion of distribution 
functions needed by triangle inequality generalization of a metric on statistical 
metric spaces. Menger’s triangular norms formed a large, rather heterogeneous 
class of symmetric bivariate aggregation functions fulfilling f(1,a) > 0 when- 
ever a > 0. Nowadays the definition of triangular norms due to Schweizer and 
Sklar includes associativity and the neutral element e = 1. Note that 
associativity allowed the extension of triangle inequality to the polygonal in- 
equality, including the fact, that now triangular norms can be applied to any 
finite number of inputs, that is, they form extended aggregation functions as 
in Definition [I.6] Triangular norms have become especially popular as models 
for fuzzy sets intersection. They are also applied in studies of probabilistic 
metric spaces, many-valued logic, non-additive measures and integrals, etc. 
For an exhaustive state-of-the-art overview in the field of triangular norms we 
recommend the recent monographs id, ag. 


3.4.1 Definitions 


Definition 3.10 (Triangular norm (t—-norm)). A triangular norm (t- 
norm for short) is a bivariate aggregation function T : [0,1]? — [0,1], which 
is associative, symmetric and has neutral element 1. 


It follows immediately that a t-norm is a conjunctive aggregation function. 
Its dual (hence disjunctive) aggregation function is called a triangular conorm. 


Definition 3.11 (Triangular conorm (t—conorm)). A triangular conorm 
(t-conorm for short) is a bivariate aggregation function S : [0,1]? — [0,1], 
which is associative, symmetric and has neutral element 0. 


2 Consider adding pieces of positive evidence in a court trial. Of course, there 
could be negative evidence, in which case the aggregation is of mixed type, see 
Chapter [4] 

3 Triangular conorms are also known as s-norms , see e.g., id. 
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As with any conjunctive and disjunctive aggregation function, each t-norm 
T and each t-conorm S' have respectively 0 and 1 as absorbing elements. 


Example 3.12. The four basic examples of t-norms and t-conorms are the 
following: 


1. Tmin(x, y) = min(z,y) (minimum)  Smaz(z, y) = max(z, y) (maximum) 
2. Tp(xz,y) = x-y (product) Sp(z, y) = x+y-— z: y (probabilistic sum) 
3. Tr(x,y) = max(0,x +y —1) Sr(z,y) = min(1,2 + y) 


(Łukasiewicz t-norm) (Lukasiewicz t-conorm) 
4. 
0, if (x,y) € [0,1 
Tp(z, y) = (drastic product) (3.1) 
min(z,y) otherwise 
and 
1, if (æ, y) €]0, 1]? 
Splz, y) = (drastic sum) (3.2) 


max(z,y) otherwise. 


Note 3.13. Other notations that can be found in the literature (e.g., id, 21l) are 
the following: 

e P,Prod,II and P*, Prod*,II* for the product and the dual product. 

e W and W* for the Lukasiewicz t-norm and the Lukasiewicz t-conorm. 

e Zand Z* for the drastic product and the drastic sum. 

The product and Lukasiewicz t-norms are prototypical examples of two 
important subclasses of t-norms, namely, strict and nilpotent t-norms, which 
are studied in detail in Section [8.4.3] The graphs of these four basic examples 
are presented on Figures BI] and B.2] 


In the discussion of the properties of t-norms and t-conorms, we will pay 
particular attention to t-norms, and will obtain similar results for t-conorms 
by duality. 

A related class of functions is called triangular subnorms. 


Definition 3.14 (Triangular subnorm (t-subnorm)). A t-subnorm is a 
function f : [0,1]? — [0,1], which is non-decreasing, associative, symmetric 
and conjunctive. 


Evidently, any t-norm is a t-subnorm but not vice-versa. For instance the 
zero function is a t-subnorm but not a t-norm. t-subnorms are in general 
not aggregation functions because f(1,1) = 1 may not hold. One can always 
construct a t-norm from any t-subnorm by just enforcing the neutral element, 
i.e., by re-defining its values on the set {(x, y) € [0,1]? : z = 1 or y = 1}. Of 
course, the resulting t-norm would generally be discontinuous. The drastic 
product is an example of such a construction, where the t-subnorm was the 
zero function. 

The dual of a t-subnorm is called a t-superconorm. 
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Tn in Tp 





Fig. 3.1. The four basic t-norms: the minimum, product, Lukasiewicz and drastic 
product. 


Definition 3.15 (Triangular superconorm (t-superconorm)). A t-super- 
conorm is a function f : [0,1]? — [0,1], which is non-decreasing, associative, 
symmetric and disjunctive. 


Evidently, any t-conorm is a t-superconorm but not vice-versa. We can 
obtain a t-conorm from any t-superconorm by enforcing the neutral element 
0, for example, the drastic sum was obtained in this way from the function 


f(x,y) =1. 
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Fig. 3.2. The four basic t-conorms: the maximum, probabilistic sum, Lukasiewicz 
and drastic sum. 


3.4.2 Main properties 


Because of their associativity, t-norms are defined for any number of argu- 
ments n > 1 (with the usual convention T(t) = t), hence they are extended 
aggregation functions. Indeed, the associativity implies 


T (a1, £2, £3) = T (£1, T (x2, £3)) = T(T (z1, £2), £3), 
and, in general, 


T E A E ta) See E E a) 
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Example 3.16. The n-ary forms of min and product are obvious. For Lukasiewicz 
and the drastic product we have the following extensions, 


TID eg Oy) = max()> zi — (n — 1),0), 
i=1 


_ ja, if; =1 for all 7 Fi, 
Tp(%1,.--,2n) = te otherwise. 


We can derive the following properties by some basic algebra. 


e Every t-norm satisfies the boundary conditions: for each t € [0, 1]: 
T(t,0) = 7(0,t) =0, (3.3) 
T(t,1) =T(1,t) =¢, (3.4) 
which are just restatements of the fact that a t-norm has absorbing element 
0 and neutral element 1. 
e The weakest and the strongest t-norm are the drastic product and the 
minimum: Tp(x) < T(x) < min(x) for every x € [0, 1]” and every t-norm 
T. 


The only idempotent t-norm is the minimum. 
From their definitions we can deduce that 


Tp(x) < Tr(x) < Tp(x) < min(x) 





for every x € [0,1]”. 
e t-norms are monotone non-increasing with respect to argument cardinality 


Tn41(@1,---,2n41) < Talt atn) for all n > 1. 


e t-norms may or may not be continuous. The drastic product is an example 
of a discontinuous t-norm. Minimum is a continuous t-norm. 

e The Lipschitz constant of any Lipschitz t-norm is at least one: M > 1 (see 
Definition [I.58). 
Not all t-norms are comparable (Definition [L56). 
A pointwise minimum or maximum [| of two t-norms is not generally a 
t-norm, although it is a conjunctive aggregation function. 

e A linear combination of t-norms aT\(x) + bTo(x), a,b € R, is not gen- 
erally a t-norm, although it is a conjunctive aggregation function if 
a,b € [0,1],b=1-<a. 


t See footnote [I] p. 
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The equivalent properties of t-conorms are listed below. 


e Every t-conorm satisfies the boundary conditions: for each t € [0, 1]: 
S(t,1) = $(1,t) = 1, (3.5) 


S(t,0) = $(0,t) = t, (3.6) 


which are just restatements of the fact that a t-conorm has absorbing 
element 1 and neutral element 0. 

e The weakest and the strongest t-conorm are the maximum and the drastic 
sum: max(x) < S(x) < Sp(x) for every x € [0, 1]” and every t-conorm S. 


The only idempotent t—-conorm is the maximum. 
From their definitions we can deduce that 


max(x) < Sp(x) < Sz(x) < Sp(x) 


for every x € [0,1]”. 
e t-conorms are monotone non-decreasing with respect to argument cardi- 
nality 
Sn41(%1,---;2n41) > Sn(#1,---,2n) for al n > 1. 


e t-conorms may or may not be continuous. The drastic sum is an example 
of a discontinuous t-conorm. Maximum is a continuous t-conorm. 
The Lipschitz constant of any Lipschitz t-conorm is at least one: M > 1. 
Not all t-conorms are comparable. 
A pointwise minimum or maximum of two t-conorms is not generally a 
t-conorm, although it is a disjunctive aggregation function. 

e A linear combination of t-conorms aS)(x) + 652(x), a,b € R, is not 
generally a t-conorm, although it is a disjunctive aggregation function 
if a,b € [0,1],b= 1 — a. 


The n-ary forms of the maximum and probabilistic sum t-conorms are 
obvious. In the case of Lukasiewicz and drastic sum we have the following 
formulae: 


Speg tn) = min(}> xi, 1) 
i=1 


Splitin) = a = 0 for all j #1, 


1 otherwise. 


We recall the definitions of zero and one divisors (Definitions [£34] and 
L37), which for t-norms and t-conorms take the special form 
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Definition 3.17 (Zero and one divisors). An element a €]0,1[ is a zero 
divisor of a t-norm T if there exists some b €]0, 1| such that T (a,b) = 0. 
An element a €]0,1[ is a one divisor of a t-conorm S if there exists some 


b €]0,1[ such that S(a,b) = 1. 


Definition 3.18 (Nilpotent element). An element a €]0,1[ is a nilpotent 
element of a t-norm T if there exists an n € {1,2,...} such that 

n—times 

) 


Tn (G,2.2,0) = 0. 


An element b €]0, 1| is a nilpotent element of a t-conorm S if there exists an 
n E€ {1,2,...} such that 
n—times 


n 
Sn(b,...,6) = 1. 


Any a €]0,1[ is a nilpotent element and also a zero divisor of the 
Lukasiewicz t-norm Ty, as well as of the drastic product. The minimum and 
the product t-norms have neither nilpotent elements nor zero divisors. 


Note 8.19. 1. Each nilpotent element a of a t-norm T is also a zero divisor of T, 
but not conversely. For that, it is enough to consider the nilpotent minimum 
defined by 

_ J 0, ife+y <1, 
Tom (@,y) = { min(z,y) otherwise , 
whose set of nilpotent elements is the interval ]0, 0.5] and its set of zero divisors 
is the interval ]0, 1[. 
2. If a is a nilpotent element (a zero divisor) of a t-norm T then each b €]0, a[ is 
also a nilpotent element (zero divisor) of T. 


Proposition 3.20. Any t-norm has zero divisors if and only if it has nilpo- 
tent elements. 


3.4.3 Strict and nilpotent t-norms and t—conorms 


Definition 3.21. (Strict t-norm) A t-norm T is called strict if it is con- 
tinuous and strictly monotone on |0, 1]?, i.e., T(t, u) < T(t,v) whenever t > 0 
and u < v. 


Note 3.22. Of course, there are no t-norms (or any conjunctive aggregation func- 
tions) strictly monotone on the whole domain (as they have an absorbing element 
f(t,0) = f(0,t) = 0 for all t € [0,1]). Thus strict monotonicity is relaxed to hold 
only for x; > 0. 


Note 3.23. A strictly increasing t-norm (on ]0,1]") need not be strict (it can be 
discontinuous). 
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Definition 3.24. (Nilpotent t-norm) A t-norm T is called nilpotent if it 


is continuous and each element a €]0,1[ is a nilpotent element of T, i.e., if 
n—times 


there exists ann € {1,2,...} such that T(a,...,@) =0 for any a €]0, 1[. 


Example 3.25. The product Tp is a strict t-norm and the Lukasiewicz Ty, is 
a nilpotent t-norm. 


Of course, there are t-norms that are neither strict nor nilpotent. However, 
a t-norm cannot be at the same time strict and nilpotent. 


Example 3.26. The drastic product is a non-continuous t-norm for which each 
element of |0,1[ is nilpotent. It is an example of t-norm that is neither strict 
nor nilpotent (because it is discontinuous). The minimum is a continuous t- 
norm which is neither strict nor nilpotent. The following t-norm is strictly 
monotone on ]0, 1]? but it is discontinuous, and hence it is not strict 


T(z,y) = { F> if (x,y) € [0, 1P 


min(x, y) otherwise. 
For triangular conorms we have similar definitions, essentially obtained by 
duality. 


Definition 3.27. (Strict t-conorm) A t-conorm S is called strict if it is 
continuous and strictly increasing on [0,1[?, i.e., S(t,u) < S(t,v) whenever 
t<landu<v. 


Note 3.28. Of course, there are no t-conorms (or any disjunctive aggregation func- 
tions) strictly increasing on the whole domain (as they have an absorbing element 
f(t,1) = f(1,t) = 1 for all t € [0,1]). Thus strict monotonicity is relaxed to hold 
only on [0, 1[”. 


Definition 3.29. (Nilpotent t-conorm) A t-conorm S is called nilpotent 
if it is continuous and each element a €]0, 1| is a nilpotent element of S, i.e., 


n—times 


if there exists an n € {1,2,...} such that S(a,...,@) =1, for any a €]0,1[. 


Example 3.30. The probabilistic sum Sp is a strict t-conorm and the 
Lukasiewicz Sz is a nilpotent t-conorm. But the drastic sum and the maxi- 
mum t-conorms are neither strict nor nilpotent t-conorms. 


Note 3.31. A t-conorm S$ is strict (nilpotent) if and only if its dual t-norm T is 
strict (nilpotent). 
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3.4.4 Archimedean t-norms and t—conorms 


Definition 3.32. (Archimedean t-norm) A t-norm is called Archimedean 
n—times 


if for each (a,b) €]0,1[? there is an n € {1,2,...} with T(G,...,@) < b. 


This property is equivalent to the limit property, i.e., for all ¢ €]0,1{: 
n—times 
~_—_ 
lim T,,(t,...,£) = 0. It also implies that the only idempotent elements (i.e., 
n—oco 


the values a such that T(a,a) = a) 0 and 1. 


Definition 3.33. (Archimedean t-conorm) A t-conorm is called Archime- 
n—times 


dean if for each (a,b) €]0,1[? there is ann € {1,2,...} with S(@,..-,@) > b. 


Proposition 3.34. A continuous Archimedean t-norm (t-conorm) is either 
strict or nilpotent. 


e Ifat-normT (t-conorm S) is strict then it is Archimedean; 
e Ifat-normT (t-conorm S) is nilpotent then it is Archimedean. 


Archimedean t-norms are usually continuous, although there are some 
examples when they are discontinuous. We will concentrate on continuous t- 
norms as they play an important role in applications. One special property of a 
continuous Archimedean t-norm (or t-conorm) is that it is strictly increasing, 
except for the subset where its value is 0 (or 1 for t-conorms). 


Proposition 3.35. A continuous t-norm T is Archimedean if and only if it 
is strictly increasing on the subset {(x,y) € [0,1]?| T(x, y) > O}. 


Note 8.86. In (1.49), p.27 the authors call this property the conditional cancelation 
law: Vx, y, z, T (x,y) = T(x,z) > 0 implies y = z. 


As we shall see in the subsequent sections, continuous Archimedean t- 
norms are very useful for the following reasons: a) they form a dense subset in 
the set of all continuous t-norms, and b) they can be represented via additive 
and multiplicative generators. This representation reduces calculation of a t— 
norm (a multivariate function) to calculation of the values of its univariate 
generators. 
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3.4.5 Additive and multiplicative generators 


In this section we express t-norms and t-conorms by means of a single real 
function g : [0,1] — [0,co] with some specific properties. 

Let g be a strictly decreasing bijection. The inverse of g exists and 
is denoted by g~1, and of course (g~! o g)(t) = (go g *)(t) = t. Also 
Dom g = Ran g~* and Dom g~' = Ran g. This is very convenient for 
our purposes, since Dom g7! = [0,00], and we intend to apply g7! to a 
construction involving the values g(21), g(v2),..., namely (see Fig. [3.4) 


g*(9(#1) + g(w2) +... + 9(an)), 


so the argument of g~! can be any non-negative value. 

However, we also need a well defined inverse for the case when g : [0,1] > 
(0, a] is a strictly decreasing bijection), and 0 < a < œ. Here again, the inverse 
g + exists, but Dom g~* = (0, a], see Fig. 8.3] We want to have the flexibility 
to use g~+ with any non-negative argument, so we extend the domain of g7! 
by using 

=i — g (t), ift E€ [0, al, 
a= l otherwise. 


We call the resulting function pseudo-inversd4 Note that we can express g” 


as 
g(t) = sup{z € [0,1] | g(z) > t} 


for all t € [0,co]. This definition covers both cases a < oo and a = ow, 
i.e., applies to any continuous strictly decreasing function g : [0,1] — [0, co] 
verifying g(1) = 0. 

The reason why we concentrated on continuous strictly decreasing func- 
tions is that any continuous Archimedean t-norm can be represented with the 
help of a continuous additive generator. An additive generator is a strictly 
decreasing function g : [0,1] — [0, co] verifying g(1) = 0. 


Proposition 3.37. Let T be a continuous Archimedean t-norm. Then it can 
be written as 


T (x,y) = g" (g(x) + 9(y)), (3.7) 


where g : [0,1] — [0,00], g(1) = 0, is a continuous strictly decreasing function, 
called an additive generator of T. 


Note 3.88. For more than two arguments we have 


T(x) = 9? (g(#1) + g(w2) +... + 9(an))- 


5 Note that since g is a bijection, it verifies g(0) = a,g(1) = 0. We also remind that 
a strictly monotone bijection is always continuous. 

6 Section B.4.6] treats pseudo-inverses in the general case, when g is not necessarily 
continuous, strictly decreasing or a bijection. 


136 3 Conjunctive and Disjunctive Functions 














Fig. 3.3. A typical additive generator of a nilpotent t-norm and its pseudo-inverse. 


Note 3.89. The converse of Proposition [3.37] is also true: Given any strictly decreas- 
ing function g : [0,1] — [0,20], g(1) = 0, the function T defined in is a 
continuous Archimedean t-norm. t-norms with additive generators (continuous or 
discontinuous) are necessarily Archimedean. 


Note 3.40. A t-norm with an additive generator is continuous if and only if its 
additive generator is continuous. 
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Fig. 3.4. Construction of strict (left) and nilpotent t-norms (right) using additive 
generators. 


Example 3.41. Additive generators of the basic t-norms 


1. If g(t) = 1 — t we obtain the Lukasiewicz t-norm, 
2. If g(t) = — logt we obtain the product t-norm. 


Note 3.42. The minimum t-norm has no additive generator (it is not Archimedean). 


3.4 Triangular norms and conorms 137 


Proposition 3.43. An additive generator is defined up to a positive multi- 
plicative constant, i.e., if g(t) is an additive generator of T, then cg(t),c > 0 
is also an additive generator of T. 


Thus we can have multiple additive generators of the same Archimedean 
t-norm T, for example g(t) = —2log(t) = —log(t?) and g(t) = —log(t) are 
both additive generators of the product t-norm. 


Proposition 3.44. If g : [0,1] — [0, co] is an additive generator of a contin- 
uous Archimedean t-norm T, then: 


e T is strict if and only if g(0) = œ; 
e T is nilpotent if and only if g(0) < co. 


By using duality, we obtain analogous definitions and properties for t- 
conorms. Note though, that the additive generators of continuous Archimedean 
t-conorms are strictly increasing. 


Proposition 3.45. Let S be a continuous Archimedean t-conorm. Then it 
can be written as 


S(x,y) = h (h(x) + hy), (3.8) 


where h : [0,1] — [0, co], h(0) 


is a continuous strictly increasing function, 
called an additive generator of isi 


Note 3.46. The converse of Proposition B.45] is also true: Given any continuous 
strictly increasing function h : [0,1] — [0,00], h(0) = 0, the function S defined 
in (8.8) is a continuous Archimedean t—conorm. 


Due to the duality between t-norms and t-conorms, additive generators of 
t-conorms can be obtained from the additive generators of their dual t-norms. 


Proposition 3.47. Let T be a t-norm, S its dual t-conorm, and g : [0,1] > 
(0, co] an additive generator of T. The function h : [0,1] — [0,co] defined by 
h(t) = g(1 — t) is an additive generator of S. 


By using duality we also obtain the following results: 


e An additive generator of a t-conorm is defined up to an arbitrary positive 
multiplier. 

e A t-conorm with an additive generator is continuous if and only its addi- 
tive generator is continuous. 


T The expressions for pseudoinverse for continuous strictly increasing functions 
change to 
ROD oe { h—*(t), if t € [0, a], where a = h(1) 
1 otherwise. 


and hP (t) = sup{z € [0, 1]| h(z) < t}, see also Section B46 
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e A continuous Archimedean t—-conorm with an additive generator h is strict 
if and only if h(1) = ow. 

e A continuous Archimedean t-conorm with an additive generator h is nilpo- 
tent if and only if h(1) < œ. 


Example 3.48. Additive generators of the basic t-conorms 


1. If h(t) = t we obtain the Lukasiewicz t-conorm, 
2. If h(t) = —log(1 — t) we obtain the probabilistic sum t-conorm. 


Note 3.49.The maximum t—conorm has no additive generators(it is not Archimedean). 


From an additive generator g : [0,1] — [0,00], of any t-norm it is possible 


to define its corresponding multiplicative generator in the form 6(t) = e~9). 
The function @ : [0,1] — [0,1], 0(1) = 1 is continuous and strictly increasing. 


Proposition 3.50. Let T be a continuous Archimedean t-norm. Then it can 
be written as 


T(x,y) = 0 (0(x) - 0(y)), (3.9) 


where @: [0,1] — [0,1], 6(1) = 1 is a continuous strictly increasing function, 
called a multiplicative generator of T. 


For t-conorms we have the analogous result, if h is an additive generator of 
S, then the function y(t) = e~" is a strictly decreasing function y : [0,1] > 
(0, 1], y(0) = 1 called multiplicative generator of S. 


Proposition 3.51. Let S be a continuous Archimedean t-conorm. Then it 
can be written as 


S(x,y) = 9 (g(x) - oly); (3.10) 
where ọ : [0,1] — [0,1], y(0) = 1 is a continuous strictly decreasing function, 
called a multiplicative generator of S. 


Note 3.52. Any continuous strictly increasing function 6 : [0,1] — [0,1] with 
6(1) = 1 defines a continuous Archimedean t-norm by (8.9). Any continuous 
strictly decreasing function y : [0,1] — [0,1] with y(0) = 1 defines a continuous 
Archimedean t-conorm by (3.10). 


3.4.6 Pseudo-inverses 


In this section we study the definition of pseudo-inverses in greater detail, 
which allows us to define them for discontinuous functions. 


Definition 3.53 (Pseudo-inverse of a monotone function). Let g : 
[a,b] — [c,d] be a monotone function, where [a,b],[c,d] are subintervals of 
the eatended real line. Then the pseudo-inverse of g, gT» : [c,d] — [a,b] 
is defined by 


g(t) = supfz € [a,b] | (9(z) — t)(9(b) — g(a) < 0}. 
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From the above definition we have: 


Corollary 3.54. Let g : [a,b] — [c,d] be a monotone function, where |a, b], [c,d] 
are subintervals of the extended real line. 


e If g(a) < g(b) we obtain the following formula 
g(t) = sup{z € [a,b] | 9(z) < t} 
for allt € [c,d]. 
e Ifgla)>g(b) we obtain the following formula 
g” (t) = sup{z € [a,b] | g(2) > t} 
for all t € [c,d]. 
e If g(a) = g(b) for allt € [c,d] we have 
gP (t) =a. 
The illustration of this is given in Fig. B.5] which shows how to construct 
the graph of the pseudo-inverse g7? of the function g : [0,1] — [0, a]: 


1 Draw vertical line segments at discontinuities of g. 

2 Reflect the graph of g at the graph of the identity function on the extended 
real line (dashed line). 

3 Remove any vertical line segments from the reflected graph except for 
their lowest points. 


a |, 
=~ g n 


o 1 + ———— 


a 





Fig. 3.5. Construction of the pseudo-inverse in the general case. 


By using this more general definition of the pseudo-inverse, we can con- 
struct Archimedean t-norms which have discontinuous additive generators. 


Example 3.55. The drastic product t-norm has a discontinuous additive gen- 


erator ; 
2—t, ifte [0,1], 
=la, ift=1. 
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3.4.7 Isomorphic t-norms 


Let us recall transformations of aggregation functions on p. [29| namely 


g(x) = VF (p(21), p(22),---,9(Zn))), 


where ~,y are univariate strictly increasing bijections. We saw that the 
resulting function g is an aggregation function, associative if f is associa- 
tive and w = 71. Let us now take an automorphism of the unit interval 


p 


: [0,1] — [0,1]. We remind that an automorphism of the interval [0,1] is a 


continuous and strictly increasing function y verifying y(0) = 0, (1) = 1. An 


automorphism is obviously a bijection. Hence y has the inverse y~~. 


1 


Example 3.56. Some examples of automorphisms y : [0,1] — [0,1] are given 


below: 
© y(t) = 
plt) =t, A>0 
y(t) =1-(1-t)*, \>0. 
e y(t) =A, \>0,A41. 
e y(t) = SUPE) A> -1a>0. 


log(1+A) ? 


08 


06 









2t 
QO FT 








A px) Ë 
gsO= 1-0-0 
E 
p= 
02 t 
PHO 4 
na + 3t?) 


PO nA 
m 02 04 06 08 i 


t 


Fig. 3.6. Graphs of the automorphisms in Example B56] with a fixed parameter A. 


Note that if T is a t-norm and ¢ is an automorphism on the unit interval, 


then the function Tọ : [0,1]? — [0, 1], defined as 
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Tolz, y) = p(T (y(x), o(y))) (3.11) 


is also a t-norm, which is said to be isomorphid?] to T. 


Proposition 3.57. Let T be an Archimedean t-norm with an additive gener- 
ator g : [0,1] — [0, co], g(1) = 0, and p an automorphism of [0,1]. 


Í. 


os p 


The function 
Tolz, y) = 9 “(T(¥(2), p(y))) 


is an Archimedean t-norm, isomorphic to T. 

g = goy: [0,1] — [0,00] is an additive generator of the Archimedean 
t-norm Ty. 

To is continuous if and only if T is continuous. 

Ty is strict if and only if T is strict. 

Ty is nilpotent if and only if T is nilpotent. 


One is not restricted to the automorphisms of the unit interval, strictly 


increasing bijections w : [0, o0] — [0,00] can also be used to obtain t-norms 
isomorphic to a given t-norm T: 


Proposition 3.58. Let T be a continuous Archimedean t-norm with an ad- 
ditive generator g : [0,1] — [0,co], and let y be a strictly increasing bijection 
[0, co] — [0, oo}. 


1. 


The function g = pog : [0,1] — [0,00] is an additive generator of a 
continuous Archimedean t-norm Ty isomorphic to T 

Th is strict if and only if T is strict. 

Ê, is nilpotent if and only if T is nilpotent. 


The next statement is a very strong result, which establishes that all con- 


tinuous Archimedean t-norms are in essence isomorphic to the two prototyp- 
ical examples: the product and the Lukasiewicz t-norms. 


8 In mathematics, the term isomorphism is usually applied to algebraic structures 


such as semigroups. A detailed discussion of isomorphisms of semigroups, and of 
t-norms and t-conorms as semigroup operations, is given in [42, pp. 37-38. 


? Note that Ty is not related to T in (BII) with y = p. However, for any strictly 


increasing bijection ~ there exists an automorphism of [0,1] y, such that Tọ is 


isomorphic to T as in (3.11), and Dy = Tọ. For a strict t-norm take y = g~owog. 


For a nilpotent t-norm we take y = g» o4 o g, where w(t) = NOE ont 
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Proposition 3.59. Let T be a continuous Archimedean t-norm. 


e IfT is strict, then it is isomorphic to the product t-norm Tp, i.e., there 
exists an automorphism of the unit interval p such that Tọ = Tp. 

e IfT is nilpotent, then it is isomorphic to the Lukasiewicz t-norm Tz, i.e., 
there exists an automorphism of the unit interval p such that Tọ = Ty. 


Let us now check what will happen with the other two basic t-norms, 
the drastic product and the minimum (we remind that neither of these is 
strict or nilpotent: the minimum is not Archimedean, and the drastic prod- 
uct is not continuous). It is easy to check that under any automorphism y, 
Tmin, = Tmin and Tp, = Tp, i.e., these two t-norms do not change under 
any automorphism. 

This does not mean that all t-norms (or even all continuous t-norms) 
are isomorphic to just the four basic t-norms, there are many t-norms that 
are continuous but not Archimedean. We will see, however, in the next sec- 
tion, that all continuous t-norms are either isomorphic to Tp, Tg, or can 
be constructed from these two t-norms and the minimum (the ordinal sum 
construction). 

By using automorphisms we can construct new families of t-norms based 
on an existing t-norm. 


Example 3.60. Consider the Lukasiewicz t-norm Tz. 
e Take y(t) = tò (A> 0), since it is p(t) = 1, we get: 
T(z, y) = (max(0, zò + yò — 1)). 


e Take y(t) = 1 — (1 — t)ò (A > 0), the inverse is p—1(t) = 1 — (1—t)™”, 
and therefore: 


T(z, y) =1—|min(1,(1—2#)*+(1—y))}”. 


Example 3.61. Consider the product t-norm Tp. 
e Take y(t) = 1 — (1 — t)ò (A> 0), then 


T,(z,y) =1-[(1—2)+(1—-y)-(1-2)\(1—yp)'. 





e Take y(t) = 44, since y1(t) = z4, we get: 
2xy 

cn i E 

p(y) cty+1l—szy 


Moreover, Propositions[3.57Jand [3.58]present an opportunity to construct, 
based on any continuous Archimedean t-norm, interesting families of t-norms 
by modifying their additive generators. 
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Example 3.62. Let T be a continuous Archimedean t-norm and g : 
(0, 1] — [0, co] its additive generator. 


e For each \ €]0,00[ the function wo g = gò : [0,1] — [0,00] is also an 
additive generator of a continuous Archimedean t-norm which we will 
denote by T^). The family of these t-norms is strictly increasing with 
respect to parameter A, and, curiously, adding the limit cases to this family 
of t-norms, i.e., T© = Tp and T(®) = min we get well-known families of 
t-norms of Yager, Aczél-Alsina and Dombi (see Section B-4.11) depending 
on the initial t-norm is Tz, Tp or TË respectively. 

e Let T* be a strict t-norm with an additive generator g* : [0,1] — [0, co]. 
Then for each A €]0, 00], the function g(g«,y) : [0,1] — [0,00] defined by 


Igra (t) = INT (Ag* E) 


is an additive generator of a continuous Archimedean t-norm which we 
will denote by Tr,,). For example, for A €]0, oo] we have Teri») = TE, 
the Hamacher t-norms, see p. 


3.4.8 Comparison of continuous Archimedean t-norms 


In Section [L34] we defined standard pointwise comparison of aggregation 
functions. We have seen that the basic t-norms verify 


Tp < Tr < Tp < min. 


However, not all t-norms are comparable, although many of them, especially 
those from parametric families, are. It is possible to find couples of incompa- 
rable strict and nilpotent t-norms. 

The incomparability of two continuous Archimedean t-norms should be 
viewed based on the properties of their additive generators. This idea was 
introduced by Schweizer and Sklar in (221). We summarize their results. 


Proposition 3.63. Let Ti and To be two continuous Archimedean t-norms 
with additive generators gi, 92 : [0,1] — [0, co], respectively. 


(i) The following are equivalent: 
1: Ti <T»> 
2. The function (gı o gz") : [0, g2(0)] — [0,00] is subadditive, i.e., for all 
x,y € (0, g2(0)] with x + y € [0, g2(0)] we have 


(91093 )(@+y) < (g1 © 93 ')(x) + (g1 © 93 ')(y) 


(ii) If the function s = (gı o g3") : [0, g2(0)| — [0,00] is concave E, then we 
get Tı < To. 


10 Le., s(atı + (1 — a)t2) > as(tı) + (1 — a)s(t2), holds for all t1,t2 € Dom(s) and 
O<a<l. 
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-1 
iii) If the function f(t) = {oroz NE) :]0, g2(0)| — [0,00] is non-increasing 
t 
then we get Ti < To. 


Note 3.64. Each concave function f : [0,a] — [0,00] verifying f(0) = 0 is subaddi- 
tive. However, in general, subadditivity of a function does not imply its concavity. For 
instance the following strictly increasing and continuous function g : [0, co] — [0, oo] 
is subadditive but evidently not concave 


3t, if t € [0,3] , 
g(t)=4 t+7, if t €]3,5], 
2t+1 otherwise. 


3.4.9 Ordinal sums 


We now consider an interesting and powerful construction, which allows one 
to build new t-norms/t-conorms from scaled versions of existing t-norms/t— 
conorms. It works as follows: consider the domain of a bivariate t-norm, the 
unit square [0, 1]?, see Fig. B.7] Take the diagonal of this square, and define 
smaller squares on that diagonal as shown (define them using the upper and 
lower bounds, e.g., bounds a1, bı define [a1,b;]?). Now define T on each of 
the squares [a;, b;]? as a scaled t-norm T; (as in the following definition), and 
minimum everywhere else. The resulting function T defined in such pointwise 
manner will be itself a t-norm, called an ordinal sum. 


Definition 3.65 (Ordinal sum). Let (Ti)i=1,...,x be a family of t-norms and 
(Jai, bil)i=1,....x be a family of non-empty, pairwise disjoint open subintervals 
of [0,1]. The function T : [0,1]? — [0,1] given by 








. ,— q-)T. (222% Yai j . b,]? 
T(2,y) = > ay (bi ui) Ti TE, Eh if (x,y) € [ai, bi] (3.12) 


min(2, y) otherwise 


is a t-norm which is called the ordinal sum of the summands < ai, bi, Ti >, 
i=1,...,K, and denoted by T= (< Qis bi; Ti >)s=1,...,.K- 


Note 8.66. In fact one has to prove that the resulting ordinal sum is a t-norm, the 
key issue being associativity. This was done in and we take this for granted. 


Example 3.67. 


1. Each t-norm T is a trivial ordinal sum with just one summand < 0,1,7' >, 
i.e, we have T = (< 0,1,T >). 
2. The ordinal sum T = (< 0.2,0.5, Tp >, < 0.5,0.8, 77, >) is given by 


0.2 + E222 if (x,y) € [0.2,0.5]? , 
T(z,y) = $ 0.5+ max(x +y — 1.3,0), if (x,y) € [0.5, 0.8]? , 
min(2, y) otherwise. 
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x 


Fig. 3.7. Construction of ordinal sums: parts of the domain on which the summands 
are applied. 


3. An ordinal sum of t-norms may have infinite summands. For example 
T=(< sot, x. Tp >)nen is given by 


T(x y) = TFT + +L = sot \y = sot), if (x,y) = [z ee 
min(a, ¥) otherwise. 


The 3D plots of the two mentioned ordinal sums are shown on Fig. B.8] 
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Fig. 3.8. 3D plots of the ordinal sums in Example [3.67] (with n = 1,...,4 on the 
right). 


For the case of t-conorms we have an analogous definition 


Definition 3.68 (Ordinal sum of t—conorms). Let (Si)i=1,...,x be a family 
of t-conorms and (Jai, b;[)i=1,....« be a family of non-empty, pairwise disjoint 
open subintervals of [0,1]. The function S : [0,1]? — [0,1] defined by 


ai + (bi — ai) u ( FE, ES), if (x,y) € fai, bil’, 


= bi—ai’ bi—ai T 
S(s, y) { max(2, y) otherwise oe) 


is a t-conorm which is called the ordinal sum of summands < ai, bi, Si >, i = 
1,..., K, and denoted by S = (< aj, bi, Si >)iai,....K- 


Note 3.69. Let (< ai, bi, Ti >)i=1 
t-conorm is just an ordinal sum of t-conorms, i.e., (< 1 — bi, 1 — ai, Si >)ia1 
where each t-conorm S; is the dual of the t-norm T;. 


K be an ordinal sum of t-norms. Then the dual 


ea 


If each summand of an ordinal sum of t-norms is a continuous Archimedean 
t-norm, i.e., it has an additive generator, then the ordinal sum also has addi- 
tive generators: 


Proposition 3.70. Let (T;)i=1,....K be a family of t-norms and assume that 

for each i € I = {1,...,K} the t-norm T; has an additive generator gi : 

[0,1] — [0,oo]. For each i € I define the function hi : [ai, bi] — [0,00] by 
t—a; 


hi = gio yi, where qi : [ai, bi] — [0,00] is given by pi(t) = F=. Then for all 
(x,y) € [0,1]? 


T(x, y) = ee +hily)), if (x,y) € lai, bi]? , 


min(2, y) otherwise. 
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We can see that an additive generator of each summand of the ordinal sum 
is defined on the corresponding subinterval [a;, bi] by means of an isomorphism 
acting on an additive generator g;. This is equivalent to re-scaling the genera- 
tor. This allows one to define the ordinal sums of Archimedean t-norms using 
additive generators of the summands. 


Example 3.71. 


1. For the ordinal sum T in the ExampleB.67{2) the functions hy : [0.2, 0.5] > 
(0, co], he : (0.5, 0.8] — [0, co] are given by, respectively, 


m(t) = — log( 942), 
ha(t) = sir, 
2. In the Example 3), for each n € N the function hi : [s44, 55] > 
[0, co] is given by 
hi(t) = — log(2"+tt — 1). 


Note 8.72. The representation of a t-norm as an ordinal sum of t-norms is not 
unique, in general. For instance, we have for each subinterval [a,b] of [0,1] 
min =< (0, 1, min) >=< (a,b, min) >. 


There are many uses of the ordinal sum construction. Firstly, it allows 
one to prove certain theoretical results, which would otherwise be hard to 
establish. Secondly, it gives a way to define t-norms/t-conorms with some 
desired properties and behavior. We shall use this construction in Section B.7] 

However one of the most important results based on this construction is 
the following. 


Proposition 3.73 (Continuous t-norms classification). A continuous t- 
norm T is either 


Isomorphic to the product t-norm Tp (hence T is strict); 
Isomorphic to the Lukasiewicz t-norm Tr (hence T is nilpotent); 
T is minimum; 

T is a non-trivial ordinal sum of continuous Archimedean t-norms. 


For t-conorms we obtain an analogous result by duality. 


Proposition 3.74 (Continuous t—conorms classification). A continuous 
t-conorm S' is either 


Isomorphic to the dual product t-conorm Sp (hence S is strict); 
Isomorphic to the Lukasiewicz t-conorm Sz (hence S is nilpotent); 
S is maximum; 

S is a non-trivial ordinal sum of continuous Archimedean t-conorms. 
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3.4.10 Approximation of continuous t—norms 


Continuous Archimedean t-norms and t-conorms have a very nice and useful 
representation through their additive or multiplicative generators. Essentially, 
one has to deal with a continuous strictly monotone univariate function in or- 
der to generate a whole family of n-variate aggregation functions. Of course, 
this representation does not hold for all continuous t-norms (t-conorms), 
which can be non-Archimedean (in which case they are either minimum (maz- 
imum for t-conorms) or ordinal sums of Archimedean t-norms (t-conorms)). 

Our next question is whether any continuous t-norm can be approximated 
sufficiently well with a continuous Archimedean t-norm, in which case we 
could apply the mechanism of additive generators to represent up to a certain 
precision all continuous t-norms. The answer to this question is positive. It 


was established in Est, see also (1.49). 


Proposition 3.75 (Approximation of a continuous t-norm). Any con- 
tinuous t-norm T can be approximated uniformly] with any desired precision 
€ by some continuous Archimedean t-norm T4. 


Expressed in other words, the set of continuous Archimedean t-norms is 
dense in the set of all continuous t-norms. Of course the analogous result 
holds for t-conorms. Thus we can essentially substitute any continuous t- 
norm with a continuous Archimedean t-norm, in such a way that the values 
of both functions at any point x € [0,1]” do not differ by more than € > 0, 
and £ can be made arbitrarily small. Specifically, when using t-norms (or t- 
conorms) as aggregation functions on a computer (which, of course, has finite 
precision), we can just use continuous Archimedean t-norms (t—conorms), as 
there will be no noticeable numerical difference between an Archimedean and 
any other continuous t-norm (t-conorm). 

The second important result is related to the use of additive generators. 
We know that a continuous Archimedean t-norm can be represented by using 
its additive generators, which means that its value at any point (x,y) can 
be calculated using formula (3.7). The question is, if we take two additive 
generators that are close in some sense (e.g., pointwise), will the resulting t- 
norms be close as well? The answer to this question is also positive, as proved 
in , see also {149}. 


Proposition 3.76 (Convergence of additive generators). Let T;,i = 
1,2,... be a sequence of continuous Archimedean t-norms with additive gen- 
erators gi,i = 1,2,..., such that g;(0.5) = 1, and let T be some continuous 
Archimedean t-norm with additive generator g : g(0.5) = 1. Then 

lim T; = T 


1—00 


11 Uniform approximation of f by f means that max fx) — f(x)| <e. 
xE€[0,1]” 
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if and only if for each t €]0,1] we have 
lim gi(t) = g(t). 
t— CO 


The convergence of the sequence of t-norms is pointwise and uniform. The 
condition g;(0.5) = 1 is technical: since the additive generators are not defined 
uniquely, but up to an arbitrary positive multiplier, we need to fix somehow 
a single generator for a given t-norm, and we do it by using the mentioned 
condition. 

Propositions [3-75] and [3-76] together imply that any continuous t-norm 
can be approximated (uniformly, and up to any desired accuracy) by approx- 
imating a univariate function — an additive generator. We will use this fact 
in Section [3.4.15| where we discuss constructions of t-norms and t-conorms 
based on empirically collected data. 


3.4.11 Families of t-norms 


In this section we want to provide the reader with a list of parameter- 
ized families of t-norms. We will consider the main families of t-norms and 
t-conorms: Schweizer-Sklar, Hamacher, Frank, Yager, Dombi, Aczel-Alsina, 
Mayor-Torrens and Weber-Sugeno t-norms and t-conorms, and closely follow 
(149). Some of these families include the basic t-norms/t-—conorms as the lim- 
iting cases. Because of associativity, we only need to provide the expressions 
in the bivariate case; n-variate formulae can be obtained by recursion. 

We will also mention monotonicity of each of these families and their 
continuity with respect to the parameter. 
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Schweizer-Sklar 


The family (TSS) Ac[-,0] Of Schweizer-Sklar t-norms is given by 


min(z, y), if à = —o0, 
Tp(x,y) if A =0 
SS = pl ? a 
TAY) =) To(z,y), if A = 00, 


i 
X 


(max((x* + yò —1),0))* otherwise. 


The family (99°) A€[—c0,00] Of Schweizer-Sklar t-conorms is given by 


max(z, y), if A = —o0, 
Sp(z,y), if A= 0, 
SP (ayy) = Sp(2,y), if À = oo, 





1 — (max(((1 — z)* + (1 — y) 1),0))* otherwise. 


Limiting cases: TSS, = min, TES = Tp, TSS = Tr, TSS = Tp, 
535. = miak, SE = Se, 57° = Sp, 82° = Sp: 


e For each À € [—00, co] the t-norm TSS and the t-conorm $25 are dual to 
each other. 

e TSS and S¥* are continuous for all A € [—00, oo. 

e T° is Archimedean for A €] — 00, oo}. 

e T° is strict if and only if À €] — co, 0] and it is nilpotent if and only if 
A €]0, oof. 


Additive generators g5% ,h$% :[0, 1] — [0, oo] of the continuous Archimedean 
t-norms and t-conorms are given by, respectively 
a350) 7 = logt, if A\=0, 
55, if A €] — co, 0[U]0, cof, 

and 

—log(1—t), ifA=0, 

hX8(t) = 1-(1-t)? es, 
+, if A €] — oo, O[J9, oof. 
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ss ss 
T 10 T23 





Fig. 3.9. 3D plots of some Schweizer-Sklar t-norms and their additive generators. 
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Hamacher 
The family (Ty) €[0,00] Of Hamacher t-norms is given by 


Tp(a, y), ifrA=co, 
TH (x,y) = ¢ 0, ifA=ax=y=0, 


TY si : 
SEONG TIE) otherwise . 


The family (S#) €[0,00] Of Hamacher t-conorms is given by 


Sp(z,y), if A = œ, 
Sg) = 1, ifA\=Oandr=y=1, 
Ap otherwise. 


Hamacher t-norms and t-conorms are the only strict t-norms (t-conorms) 
that can be expressed as rational functions[3] 

Limiting cases: THS Tp, T = Tp and = Sp, c= = Sp. Moreover, 
TË and S¥ are given by 


0, ife#=y=0, 








Srey otherwise 
TY 
Sx (2,9) = Tp 


are respectively called Hamacher product and the Einstein sum. 


e For each A € (0, 00] the t-norm Tj? and the t-conorm $# are dual to each 
other. 

e All Hamacher t-norms are Archimedean, and all Hamacher t-norms with 
the exception of TË are strict. 
There are no nilpotent Hamacher t-norms. 
The family of Hamacher t-norms is strictly decreasing with À and the 
family of Hamacher t-conorms is strictly increasing with A. 

e The family of Hamacher t-norms is continuous with respect to the param- 
eter A, i.e., VAo € [0,00] we have dim T =T 

AO 


Additive generators g? , h¥ : [0,1] — [0, 00] of the continuous Archimedean 
Hamacher t-norms and t-conorms are given by, respectively, 


1=t 
H ifA=0 

i) = 
gx (t) CEN if A €]0, cof, 


and 


hŒ) = log €=- if A €]0, oof. 


1-t 


12 A rational function is expressed as a ratio of two polynomials. 
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Fig. 3.10. 3D plots of some Hamacher t-norms and their additive generators 
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Frank t-norms 


The origin of this family comes from the solutions of the following functional 
equation G(x, y) + F(x,y) = «+ y where F and G are associative functions 
and F satisfies F(a,1) = F(1, x) = x and F(a#,0) = F(0, x). Frank shows that 
F has to be an ordinal sum of the following family of t-norms. 

The family (TY) ¢[0,00] of Frank t-norms is given by 


min(a, y), ifA=0, 
Tp(2, y) ifA=1, 
F Pan od) 
Tx (2,9) = $ Tr(2,y), tise, 
log, (1+ QIVOTI) otherwise . 


The family (SẸ) Ac[0,œ] Of Frank t-conorms is given by 


max(zx, y), ifA=0, 
Sp(z, Y), ifA=1, 
SÀ (zy) = Silty) if \= 00, 


1 —log,(1+ AMT ai otherwise . 
Sa z 


Limiting cases: Tf = min, Tf = Tp, TE = Tr, 
SË = max, Sr = Sp, SE = Sr 


e For each \ € [0,00] the t-norm TY’ and the t-conorm $F are dual to each 
other. 
e Tf and Sf are continuous for all A € [0, co]. 
e TY is Archimedean if and only if À €]0, oo]. 
e TÝ is strict if and only if \ €]0, co[ and the unique nilpotent Frank t-norm 
is TĒ. 
Additive generators gf, ht : [0,1] — [0, co] of the continuous Archimedean 
Frank t-norms and t—conorms are given by, respectively 
— logt, ifA=1, 
g(t) =< 1-2, if A= œ, 
log(=4), if A €]0, 1[U]1, oof, 





and 
—log(1 — t), ifA=1, 
A(t) = ¢ t, if A= 00, 
log( ++), if A €]0, 1[U]1, cof . 
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Fig. 3.11. 3D plots of some Frank t-norms and their additive generators. 
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Yager t-norms 


This family was introduced by Yager in 1980 [269], his idea was to measure 
the strength of the logical AND by means of the parameter A. Thus A = 0 
expresses the smallest AND, and A = co expresses the largest AND. 

The family (TY ),e(0,.0) of Yager t-norms is given by 


Tp(z, y); ifA=0, 
Ty (ey) = min(z, y), i if A= 09 š 
max(1 — ((1 — x)ò + (1 — y)*)%,0) otherwise . 





The family (SY) Ac[0,2œ] Of Yager t-conorms is given by 


Sp(z, y), ifA=0, 
SY (x,y) = 4 max(z, y), if\=oo, 
min((x* + y*)%,1) otherwise . 


Limiting cases: TY = Tp, TY = Tz, TX = min, 
SY = SD, or = Sr, SY, = max. 


e For each \ € [0,00] the t-norm TY and the t-conorm SY are dual to each 
other. 

e Tx ,S¥ are continuous for all A €]0, ox]. 

e TY, S¥ are Archimedean if and only if A € [0, oof. 
TY ,S¥ are nilpotent if and only if À €]0,co[, and none of the Yager t- 
norms are strict. 


Additive generators g¥ , h¥ : [0,1] — [0, oo] of the continuous Archimedean 
Yager t-norms and t—conorms are given by, respectively 


gx (t) = (1-1), 


and 
hx (t) = t*. 
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and their additive generators. 


Fig. 3.12. 3D plots of some Yager t-norms 
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Dombi t-norms 


The following family of t-norms was introduced by Dombi in 1982 [79]. 


The family (T?) €[0,00] Of Dombi t-norms is given by 


Tp(z,y), if A = 0, 
TP? (x,y) = min(z, y), if X =o, 
1 otherwise. 


a a ane, hee 
LEERE) (AGH)A)A 


The family (SP) €[0,00] Of Dombi t-conorms is given by 


Sp(a,y), ifA=0, 
SP (x,y) = max(z, y), i if A = 00 , 
1— otherwise. 





Sa GE A Gah Sp ae 
HGD) 


Limiting cases: T? = Tp, TP = TË, TË = min, 


Be = Sp, eg = on SR = max. 


For each À € [0, 00] the t-norm T? and the t-conorm S? are dual to each 
other. 

TP are continuous for all  €]0, ox]. 

TP is Archimedean if and only if \ € [0, oof. 

TË is strict if and only if A €]0,0o[, and none of the Dombi t-norms are 
nilpotent. 

The family of Dombi t-norms is strictly increasing and the family of Dombi 
t-conorms is strictly decreasing with parameter A. 


Additive generators g?,h2 : [0,1] — [0,00] of the strict Dombi t-norms 


and t-conorms are given by, respectively 


20- (H). 


and 


AP (4). 
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Fig. 3.13. 3D plots of some Dombi t-norms and their additive generators. 


160 3 Conjunctive and Disjunctive Functions 
Aczél-Alsina t-norms 


This family of t-norms was introduced by Aczél-Alsina in 1984 in the 
context of functional equations. 
The family (T$44)\ [0,00] of Aczél-Alsina t-norms is given by 


Tp(x,y), fAa=0, 
Mey) = d mine), E 
e7 (log s)ò+(-log4)™)3 otherwise . 


The family (S44) Ac[o,œ] Of Aczél-Alsina t-conorms is given by 


Sp(2,y), ifA=0, 
Sy (z, y) = max(z, y), 1 os 
1 — e7((—log(1—-2))* +(— log(1—»))")* otherwise . 





Limiting cases: Le = Ip; TA = Tp, TA = min, 
SAA Ip S Acn Sp, SAA = max: 


e For each À € [0,00] the t-norm T#^ and the t-conorm S;'4 are dual to 
each other. 

e 7,4 are continuous for all  €]0, co], Ty“ is an exception. 

e Ts'4 is Archimedean if and only if A € [0, ox. 

e T^ is strict if and only if À €]0,oo[ and there are no nilpotent Aczél- 
Alsina t-norms. 

e The family of Aczél-Alsina t-norms is strictly increasing and the family 
of Aczél—Alsina t-conorms is strictly decreasing. 


Additive generators g34, h44 : [0,1] — [0,00] of the strict Aczél-Alsina 
t-norms and t—conorms are given by, respectively, 
gx" (t) = (~ logt)’, 


and 
h$4(t) = (—log(1 — ¢))*. 
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Fig. 3.14. 3D plots of some Aczél-Alsina t-norms and their additive generators. 
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Sugeno and Weber t-norms 


In Weber 1983 the use of some special t-norms and t-conorms was 
proposed to model the intersection and the union of fuzzy sets, respectively. 
This family of t-conorms has been considered previously in the context of 
A-fuzzy measures in [230]. 

The family (TSW) €[-1,00] Of Sugeno—Weber t-norms is given by 


Tp(2,y), ifA=-1, 
TY (x,y) = 4 Trey), if \=00, 
max( oey tty 0) otherwise. 


The family ($2) d€[—1,00] Of Sugeno—Weber t-—conorms is given by 


Sp(z,y), ifA\=—1 ) 
Sy” (æy) = Sp(x,y), ifA=0oo, 
min(a + y+ zy, 1) otherwise. 


Limiting cases: T°” = Tp, TW = Tz, TSW = Tp, 
DSW = Sp SN =W = Sp: 


e For à, u €] — 1,00[ the t-norm TSW and the t-conorm Sow are dual to 
each other if and only if u = -5 The following pairs (TSW, SSW) and 
(TSW, SS) are also pairs of dual t-norms and t-conorms. 

e All Sugeno-Weber t-norms with the exception of TSW are continuous. 

e Each TSW is Archimedean, and it is nilpotent if and only if A €] — 1, oof. 

e The unique strict Sugeno—Weber t-norm is TSW. 


Additive generators gf W,h$W: [0, 1]— [0, 00] of the continuous Archimedean 
t-norms and t-conorms of this family are given by, respectively 


1—t, if \=0, 
gS w(ë) = logt, if A =o, 
1 — pate). if A €] — 1,0[U)0, oof, 
and 
t, if \=0, 
ASW (t) = ¢ —log(1-— t), ifA=—1, 


ao, if A €] — 1, 0[LJ]O, oof. 
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sw 
T=0.5 





Additive generators of t-norms and t-conorms 





-norms and their additive 


Fig. 3.15. 3D plots of some Sugeno and Weber t 


generators. 
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Mayor and Torrens t-norms 


The t-norms of this family are the only continuous t-norms that satisfy for 
all x,y € [0,1] the following equation 


T(z, y) = max(T(max(x,y),max(x,y)) — |x — yl, 0). 
The family (TMT) e[0,1] Of Mayor—Torrens t-norms is given by 


TMT (gy) = max(x+y—A,0), if A €]0, 1] and (2, y) € [0, A}? , 
A ? min(z, y) otherwise. 


The family (SMT) Ac{o,1] Of Mayor-Torrens t-conorms is given by 


g= min(x +y +A-— 1,1), if \ €]0, 1] and (z, y) € [1 — à,1]?, 
Aa DY max(x, y) otherwise. 

Limiting cases: TXT = min, SXT = max, 
TMT = Tr; and SMT = Sz. 


e For each  € [0,1] the t-norm TIT and the t-conorm SFT are dual to 
each other. 

e The Mayor—Torrens t-norms and t-conorms are ordinal sums with one 
summan4d, respectively, i.e., TMT = (< 0,A,Tp >) and SMT = (< 1- 
A, 1, Sz >): 
Each TMT is a continuous t-norm. 
T;“" is a unique Archimedean and nilpotent t-norm. There are no Mayor- 
Torrens strict t-norms. 

e The family of Mayor-Torrens t-norms is strictly decreasing with À and 
the family of Mayor—Torrens t-conorms strictly increasing. 

e The family of Mayor-Torrens t-norms is continuous with respect to the 
parameter À, i.e., for all ào € [0,1] we have im TE E 

AD 


Additive generators of the nilpotent Mayor-Torrens t-norm TÆT and t- 
conorm SMT are given by, respectively, 


gT) = 1-1, 


and 
AM) St, 
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Fig. 3.16. 3D plots of some Mayor—Torrens t-norms. 
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3.4.12 Lipschitz—continuity 


Recently the class of k-Lipschitz t-norms, whenever k > 1, has been charac- 
terized (see ). Note that 1-Lipschitz t-norms are copulas, see Section [8.5] 


Definition 3.77 (k-Lipschitz t-norm). Let T : [0; 1]? — [0;1] be a t-norm 
and let k € [1,o0[ be a constant. Then T is k—Lipschitz if 


| T (x1, 91) — T (xa, yo) |< k(| 21 — z2 | + | y1 — y2 |) 
for all x1, £2, Y1, yo € [0, 1]. 


Note 8.78. In other words, a k-Lipschitz bivariate t-norm has the Lipschitz constant 
k in ||- ||1 norm, see Definition [£58] Of course k > 1, because of the condition 
T(t, 1) =t. 

The k-Lipschitz property implies the continuity of the t-norm. Recall 
that a continuous t-norm can be represented by means of an ordinal sum of 
continuous Archimedean t-norms, and that a continuous Archimedean t-norm 
can be represented by means of a continuous additive generator [142 [155) (see 
Section B.4.5). Characterization of all k—Lipschitz t-norms can be reduced to 
the problem of characterization of all Archimedean k-Lipschitz t-norms. 


Note 8.79. It is easy to see that if a t-norm T is k—Lipschitz, it is also m—Lipschitz for 
any m € [k,oo]. The 1—Lipschitz t-norms are exactly those t-norms that are also 
copulas (Section [3.5). A strictly decreasing continuous function g : [0,1] — 
[0, co] with g(1) = 0 is an additive generator of a 1—Lipschitz Archimedean t-norm 
if and only if g is convex. 


Definition 3.80 (k-convex function). Let g : [0,1] — [0,co] be a strictly 
monotone function and let k €|0,co[ be a real constant. Then g will be called 
k-convex if 

g(a + ke) — g(x) < gly + £) — gly) 


holds for all x € [0,1[,y €]0,1[; with x < y and € €]0, min(1 — y, +*)]. 
Note 3.81. If k = 1 the function g is convex. 


Note 3.82. If a strictly monotone function is k-convex then it is also a continuous 
function. Observe that a decreasing function g can be k-convex only for k > 1. 
Moreover, when a decreasing function g is k-convex, it is also m-—convex for all 
m > k. In the case of a strictly increasing function g*, it can be k—-convex only for 
k < 1. Moreover, when gx is k—-convex, it is m—convex for all m < k. 


Considering k € [1, co[ we provide the following characterization given in 
[is7 
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Proposition 3.83. Let T : [0,1]? — [0,1] be an Archimedean t-norm and 
let g : [0,1] — [0,20], g(1) = 0 be an additive generator of T. Then T is 
k-Lipschitz if and only if g is k-convex. 


Another useful characterization is the following 


Corollary 3.84. (Y.-H. Shyu) [e2d) Let g : [0,1] — [0,co] be an additive 
generator of a t-norm T which is differentiable on ]0,1{ and let g'(x) < 0 
for0<a<1. Then T is k—Lipschitz if and only if g'(y) > kg'(x) whenever 
O<ar<y<l. 


Corollary 3.85. Let T : [0,1]? — [0,1] be an Archimedean t-norm and let 
g : [0,1] — [0, co] be an additive generator of T such that g is differentiable on 
]0, 1[\S, where S C [0,1] is a discrete set. Then T is k—Lipschitz if and only 
if kg’ (a) < g'(y) for all x,y € [0,1],a < y such that g'(x) and g'(y) exist. 


Example 3.86. Consider Sugeno-Weber t-norms with an additive generator 


given by (p. 162) 


log(1 + At) 
SW 
galana 
gx” (t) log(1 + A) 
for A €] — 1,0[ and A €]0, co[. The derivative is 
d sw A 1 
— t = — — _ .. 
ao “=a Nl+N 
ar is convex for A > 0, so for these values it is a copula. For 
A €] — 1,0[ the derivative reaches its minimum and maximum at t = 1 


and t = 0 respectively. Thus the condition of Corollary holds whenever 
g'(1) > kg’(0). By eliminating the constant factor, we obtain 











1+ A0 
>k, or >k 
Ipa ie 
Therefore Sugeno-Weber t-norms are k—Lipschitz with k = 1 in the men- 
tioned range. For example, TSW is 2-Lipschitz. When \ = —1, this is a 
2 


limiting case of the drastic (discontinuous) t-norm. 


The following auxiliary results will help to determine whether a given 
piecewise differentiable t-norm is k—Lipschitz. 


Corollary 3.87. Let T : [0,1]? — [0,1] be an Archimedean t-norm and let 
g : [0,1] — [0, co] be its additive generator differentiable on ]0, 1|, and g'(t) < 0 
on ]0,1[. If 
inf g'(t)> k sup g(t) 
te]x,1[ tE]0,a[ 


holds for every x €]0, 1| then T is k—Lipschitz. 
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Proof. Follows from Corollary B.84] 


Corollary 3.88. Let g : [0,1] — [0,00] be a strictly decreasing function, dif- 
ferentiable on |0, a[U]a, 1[. If g is k-convexr on [0,a| and on ja,1], and if 


inf g'(t)> k sup g'(t), 
t€]a,1[ te]0,a[ 


then g is k-convez on [0,1]. 
Proof. Follows from Corollary [3.85 


Example 3.89. Consider Hamacher t-norms with an additive generator given 
by (p. ee 
1—A)t+ 
gx (t) = log 7. 2 


for A €]0, co[. The derivative is given by 


high À 
(gX) (t) = HOw 


The generator g# is convex (and hence k-convex) on [0, a] with a = w: On 


1 


[a, 1] it is k-convex, by Corollary [B87] since the infimum of (g/)’ is reached 


at t = 1 and (g¥)’(1) = —A, and the supremum is reached at t = a, and is 
4441, Thus k > eae Now, by Corollary[3.88] g is k-convex on (0, 1], with 


the same k, since supejo,ai(9x V (Œ) = (gX) (a). 
For à € [0,2] TË is a copula, and for A € [2,00] it is k-Lipschitz with 
A2 


k= TOT: For instance T# is 4-Lipschitz. 


3.4.13 Calculation 


Practical calculation of the values of t-norms and t-—conorms is done by either 
a) using the two-variate expression recursively (see Fig. [L.2), or b) using the 
additive generators when they exist (see Fig.[B-17). In both cases one has to be 
careful with the limiting cases, which frequently involve infinite expressions, 
which result in underflow or overflow on a computer. What this means is that 
the generic formula should be used only for those values of the parameter A 
for which numerical computation is stable. 

For example, powers or exponents of A, logarithms base A can only be 
computed numerically for a restricted range of values, and this range depends 
on x and y as well. Consider expression tò in the additive generator ac" : 
We need to take into account the following limiting values t = 0, =œ% 0; 
t=0,\ > =; t = 0,A > 00; t x 1, — +00. On a computer, the power tò 
is typically computed as exp(A logt). There is no point to compute exp(t) for 
t < —20 or t > 50, as the result is smaller than the machine epsilon or is too 
large. 
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When performing calculations using additive generators, one has to ex- 
ercise care with the values of t near 0 (and near 1 for t-conorms), as strict 
t-norms have an asymptote at this point. The value of 0 (or 1 for t-conorms) 
should be returned as a special case and evaluation of the additive generator 
skipped. 

For the purposes of numerical stability, it is possible to scale the additive 
generator by using a positive factor (we remind that it is defined up to an 
arbitrary positive factor). This does not affect the numerical value of the 
t-norm or t-conorm. 


typedef double ( *USER_FUNCTION)( double ); 
double p=1.0, eps=0.001; 


/* example of an additive generator with parameter p (Hamacher) */ 
double g(double t) 

{ return log( ((1-p)*#t + p)/t ); } 

double ginv(double t) 

{ return p / (exp(t)+p-1); } 


double f_eval(int n, double * x, USER_FUNCTION g, USER_FUNCTION gi) 
{ 
int i; 
double r=0; 
for(i=0;i<n;i++) { 
if (x[i]<= eps) return 0.0; 
r+= g(x[i]); 


} 


return gi(r); 


/* calling the function */ 
x[0]=0.1; x[1]=0.2; 
double z= f_eval(2,x,&g,&ginv); 





Fig. 3.17. A C++ code for evaluation of an Archimedean t-norm using its additive 
generator. 


3.4.14 How to choose a triangular norm/conorm 


In this chapter we have seen that there are many different conjunctive/ dis- 
junctive aggregation functions, in particular t-norms and t-conorms. In fact, 
there are several infinite families of such functions, and it is easy to construct 
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even more, by either modifying the additive generators, or using the ordinal 
sum construction. The question we want to answer in this section is how to 
choose the most suitable aggregation function for a specific application. 

The first thing one has to do is to determine all the required application- 
specific properties, which would narrow down the choice. For example, are as- 
sociativity and symmetry required? Are there nilpotent elements? How strong 
is mutual reinforcement of the arguments? 

Still even after limiting the choices, there are infinitely many functions that 
satisfy application requirements. Then it comes down to using some numerical 
data. We shall use a general approach discussed in Section [L6] Let us have a 
set of empirical data, pairs (xz, Yk), k = 1,..., K, which we want to fit as best 
as possible by using an aggregation function from a given class, such as t— 
norm or t-conorm. Our goal is to determine the best function from that class 
that minimizes the norm of the differences between the predicted (f(x,)) and 
observed (yx) values. We will use the least squares or least absolute deviation 
criterion, as discussed on p. B3] 

Depending on the class of aggregation functions, we have two choices: a) if 
the class is a parametric family of functions (e.g., a given family of t-norms), 
then we need to determine the best value of such a parameter; b) if the class 
is more general (e.g, all continuous t-norms) then we need to consider non- 
parametric methods, which we will discuss in Section B.4.15] 

Let us concentrate on fitting a parameter of a given family of t-norms. The 
case of t-conorms is reduced to that of t-norms by using duality: consider an 
auxiliary data set D = {Xx, 9, }#_,, where Ti, = 1 — Zik, Je = 1 — yp. Fit a 
t-norm T to that data set. The desired t-conorm is the dual of T. 

Take a family of t-norms, say the Yager family (p. [56). Fitting of a 
nonlinear parameter À in the least squares sense involves solving the following 
optimization problem 

5 2 
min > (TY (xk) — yk)» (3.14) 


à unrestricted. In the case of the least absolute deviation criterion (LAD) we 
minimize 


K 
min > | ITY (xk) — vkl - (3.15) 
k=1 


This is a typical nonlinear optimization problem (smooth (8.14) or non- 
smooth (8.15)). There is no guarantee that the objective function is convex, 
or has a unique global minimum. Therefore we recur to methods of global 
optimization, discussed in the Appendix [A.5] 

When using numerical solutions, one has to bear in mind the specifics of 
the problem. 1) Large and small values of A lead to the special cases, refer to 
a specific family of t-norms. 2) Evaluation of t-norms requires bounding the 
range of A to avoid numerical instability. Consequently, one effectively solves 
a univariate global optimization problem on one or more bounded intervals, 
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and replaces the generic formula with the limiting cases on the rest of the 
domain. 

The methods of choice are grid search with subsequent local descent (by 
using a derivative free non-smooth optimization method), or Pijavski-Shubert 
deterministic method, also with subsequent improvement by local descent. 

The associativity property of t-norms allows one to formulate a more gen- 
eral data fitting problem. Remember that a family of t-norms is an extended 
aggregation function, it is defined for any number of inputs. It is quite feasi- 
ble that the same t-norm will be used in an application to aggregate different 
numbers of inputs. Is it possible to use empirical data of varying dimension 
to fit the whole extended aggregation function (i.e., all n-variate aggregation 
functions, n = 2,3,...? 

Thus we consider the data set D in which input vectors x, may have 
different dimension, denoted by nx, as illustrated in Table B.2] 


Table 3.2. A data set with inputs of varying dimension. 





Interestingly, the optimization problems (8.14) or (8.15) need no modifi- 
cation, as long as the t-norm TY is calculated consistently for any number of 
arguments. The resulting optimal parameter A defines an optimal aggregation 
function from the chosen family. 


3.4.15 How to fit additive generators 


It is often impossible to decide, based on application-specific requirements, 
which particular parametric family of t-norms is the most suitable. We will 
investigate the problem of fitting an arbitrary continuous t-norm to the data, 
and consider separately strict and nilpotent t-norms. We know from Section 
[3.4.10] that any continuous t-norm can be approximated uniformly and with 
any desired accuracy by a continuous Archimedean t-norm. In turn, to fit a 
continuous Archimedean t-norm, we need to fit its additive generator, which 
is a strictly monotone univariate function. The approach we explore in this 
section is how to fit additive generators, which is effectively a tool for fitting 
arbitrary continuous t-norms. 

There are no specific requirements on an additive generator g, besides 
monotonicity and satisfying g(1) = 0, g(0.5) = 1. We remind that the latter 
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condition is used to fix a specific additive generator, because it is not defined 
uniquely. The number 0.5 can be replaced with any other number in ]0, 1[. An 
additive generator needs not be differentiable, or have any specific algebraic 
form. 

The method of spline approximation is very popular in numerical approx- 
imation, as splines are very flexible to model functions of any shape (see Ap- 
pendix [A.1). An additional advantage is that polynomial splines allow one to 
represent the condition of monotonicity via a simple set of linear inequalities 
involving spline coefficients (13 EJ We will use regression splines, defined as 
a linear combination of B-splines with a priori fixed knots t1, t2,...,tm in the 
interval [0, 1], 

J 
S(t) = X` cB; (t), (3.16) 
j=l 
with coefficients c; to be determined from the data. B; are usually chosen as 
B-splines, although other choices are possible (77. 

The simplest spline (or degree 1) is the broken line approximation, where 
the data points are joined by straight lines (Figure [A-]). The precision of 
spline approximation is easily controlled by increasing the number of knots, 
and because of their extremal properties, polynomial splines are considered 
the “smoothest” curves that interpolate or approximate the data. Monotone 
regression splines are explored in detail in (ial FEJ where the condition of 
monotonicity is reduced to that of non-negativity (non-positivity) of coeffi- 
cients cj in a suitably chosen basis. The spline regression problem is formu- 
lated as a linearly constrained least squares problem, which can be solved by 
a variety of methods (152), see Appendix [A.3] 

Our approach to construction of continuous t-norms consists in using a 
monotone regression spline S' in as an additive generator. We shall use 
the data set D in which input vectors x, may have different dimensions, as in 
Table B.2] By applying Eq. (8:7) and the least squares criterion, we solve 


K 
Minimize 5 (S(xiz) + S(war) +... + S(2nzk) — SYK), (3.17) 
k=1 
subject to the conditions that S is monotone decreasing, S(1) = 0 and 


S(a) = 1. Convenient choices of the value a €]0,1[ will be discussed later 
in this section. 

Conditions of monotonicity translate into c; < 0 for the B-spline basis in 
ag. Equality conditions become linear equality constraints 


J 


d. 
S(1) = se cjB;(1)=0, S(a) =X eB; (a) = 1. 


j=1 


Replacing S with (8.16) and rearranging the terms in the sum, we obtain a 
quadratic programming problem 


174 3 Conjunctive and Disjunctive Functions 


2 

E J 

Minimize ` ( cj [Bj (r1n) + Bj (ton) +... + Bj (Engk) — a) 
k=1 \j=1 


Ms 
Q 
D 


s.t. (1) =0, (3.18) 
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cj <0. 


For numerical purposes, the strict inequalities are converted to cj < —e < 0 
for some small £. Observe that the expression in the square brackets can be 
written as a single function B;(x, y) to simplify the notation. We also note 
that problem (8.18) is of type LSEI (Appendix [A.3] Eq. (A.8)), for which 
special methods have been designed. 

In the case of the least absolute deviation criterion, we obtain an opti- 
mization problem, subsequently converted to a linear programming problem 
(Eq. in the Appendix), namely 








K d 
Minimize |X c)Bj(Xx, yk) 
k=1 |j=1 
J 
st. >> ¢B;(1) =0, (3.19) 
j=l 


J 
ce; Bj(a) = 1, 
q=l 


cj <0. 


Our next task is to determine a. We distinguish two cases: nilpotent 
and strict t-norms. For nilpotent t-norms, whose additive generators satisfy 
g(0) < ov, the choice is simple, any value of a will do, so we use a = 0.5 for 
simplicity. However, for strict t-norms we need to model asymptotic behavior 
near 0. Polynomial splines are not suitable, as they are finite. Furthermore, the 
usual trick of replacing oo with a large finite number does not work, because 
additive generators are defined up to a positive multiplier. That is, setting 
S(0) = 1000 is equivalent to S(0) = 1 or any other number, as this number is 
factored out from the objective function in (8.18) or (8.19). 

A workaround is to use well-founded additive generators (136, [137], defined 


as 
14 9(ce)-—4, ift<e 


S(t), ift >. 


In this case we set a = £, where £ is the smallest strictly positive value among 
Lik, Ykyt = 1, tag thnk = Leics ER 

The asymptote near 0 is modeled by the function 1/t. The reason why we 
can use this function (or, in fact, any other function with the same asymptotic 


3.4 Triangular norms and conorms 175 


behavior) is that no observed data falls within ]0, e[. Consequently the values 
of the additive generator g on ]0,¢[ are not used to calculate any quantity in 
problems or (8.19), and can be chosen with_relative freedom, as long 
as continuity and monotonicity of g are kept (see a, p. 915). 

Finally, differentiability of an additive generator can be imposed by using 
quadratic monotone regression splines. Quadratic B-splines are easily calcu- 
lated using recursive equations (see Appendix [A.1), and conditions of mono- 
tonicity are quite similar to those for linear splines [13]. 

Figure B.18] illustrates approximation of an additive generator by linear 


splines. The data (xx, yk) were generated by using Dombi t-norm T,2,, whose 
additive generator is given by 


g(t) = es 


These data were used to solve problem (3.18), with the well-founded additive 
generator. Fig. [3.18]shows the original additive generator and its linear spline 
approximation (left), and the actual approximation to the t-norm (right). 








Fig. 3.18. Approximation of the Dombi t-norm TP, (right) and of its additive 


generator (left), using 20 randomly generated data. The spline approximation is the 
solid piecewise linear curve. 


Preservation of ordering of the outputs 


We recall from Section [L6] p. [34] that sometimes one has to fit not only 
an aggregation function to numerical data, but also preserve the ordering of 
the outputs. That is, if y; < yx then we expect f(x,;) < f(x). We already 
discussed approaches to preservation of output ordering when dealing with 


averaging aggregation functions. For t-norms, we can rely on a very similar 
technique. 
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First, arrange the data, so that the outputs are in non-decreasing order, 
i.e., Yk < Yk+1, k = 1,..., K — 1. Define the additional linear constraints 


Zol (£1k) +- - -+ B(Engk))] — [B(£1k+1) +. -+ B(Enk41k+1))] 2 0, 


k = 1,...,K — 1. The change of the sign of the inequality is due to the fact 
that S is decreasing. Add the above constraints to problem (3.18) or (8.19) 
and solve it. 

The addition of the extra K — 1 constraints does not change the structure 
of the optimization problem, nor drastically affect its complexity. 


Approximation of copulas 


One important subclass of Archimedean t-norms is Archimedean copulas, see 
Section B.5] Archimedean copulas are characterized by convex additive gen- 
erators. Convexity of splines in is imposed by adding extra linear con- 
straints on the coefficients cj41 —c; >0,j =1,...,J—1. 


3.4.16 Introduction of weights 


There were numerous studies of the issue of introducing weights into con- 
junctive and dis aad i aggreg o> functions, and in particular t-norms and 
t-conorms adag (and references therein). Weighting vectors 
played a very semen PTE in ae in the context of various means. 
Weights represented such concepts as the importance of the criteria, impor- 
tance of an expert’s opinion, or quality and reliability of information sources. 
It turns out that similar techniques are applicable to t-norms and t-conorms, 
as well as various mixed type aggregation functions discussed in Chaper [4] 

We briefly outline one such process, applicable to Archimedean t-norms 
and t-conorms, which allows one to obtain weighted versions of these aggre- 
gation functions. 

First, let us establish the fundamental properties of weighted aggregation. 
Let T be a t-norm and Tw be its weighted counterpart. The vector of weights w 
must have non-negative components, but we do not require its normalization, 
like X` w; = 1, we simply have w; > 0. 


e If all weights w; = 1, then Tw = T. 
e Ifany w; = 0, then the i-th input is irrelevant and Ty(x) = Ty,(xX), where 
w, xX are obtained from w,x by removing the i-th component. 


Note 3.90. Since t-norms have neutral element e = 1, we have an interesting coun- 
terpart of the second property, Tw(x) = Tw(X), where X is a vector obtained from 
x by replacing those components that correspond to zero weights with ones. For 


example, T(0,w2,w3) (£1; T2, x3) => Leth. tia aby) (1, v2, x3) = Lis 05) (x2, v3). 
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Consider now continuous Archimedean t-norms, which are expressed via 
additive generators g as 


T@)=9°? (>: aed) 


The method proposed in [s3 consists in replacing this expression with 
its weighted analogue, defined as follows. 


Definition 3.91 (Weighted Archimedean t-norms). Let g : [0,1] —> 
[0, co], g(1) = 0 be an additive generator of some Archimedean t-norm T, and 
w : wi > 0 be a (not necessarily normalized) weighting vector. The weighted 
Archimedean t-norm is defined as 


Tw (x) = gD (>: wate) (3.21) 


n 

Note 3.92. In there was an additional constraint X` w; = n, but it was not 
i=1 

included in the later studies. 


Note 3.93. Weighted t-norms may acquire averaging behavior, in particular they 


convert into weighted quasi-arithmetic means if X` w; = 1. However, weighted t- 
i=1 
norms are not conjunctive aggregation functions, unless w; > 1,i=1,...,n. 


Example 8.94. Weighted product t-norm (g = log) is given as 


n 

Wi 

Tew = I[¢% ‘ 
i=1 


Note the similarity to the geometric mean Gw, in which case the weighting 


n 
vector needs to be normalized by X` w; = 1. 
i=1 


Example 3.95. Weighted Lukasiewicz t-norm is given by 


Tr w(x) = max(0,1 — 5 w;(1 — 2;)). 


i=l 


Example 3.96. Weighted Yager t-norm is given by 


Tx w(x) = max(0, (1 — 2 wi(l — 24)*)/). 
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Example 3.97. Weighted Frank t-norm is given by 


Dor- 
Ty w(x) = log, C T 
(A-1) 


By using duality, we obtain a similar construction for weighted t-conorms. 


Definition 3.98 (Weighted Archimedean t-conorms). Let h : [0,1] —> 
[0, co], A(0) = 0 be an additive generator of some Archimedean t-conorm 
S, and w : wi > 0 be a (not necessarily normalized) weighting vector. The 
weighted Archimedean t-conorm is defined as 


Sw(x) = ACY (>: wht) . (3.22) 


Example 3.99. Referring to Examples B-J4B97]we have 


n 


Spw =1-[[Q-«), 


i=l 


SL,w(x) = min( 1s witi), 


SX w(x) ) = min( D (1 = x) ^), 


[poe -e 
SF w(x) =1-—log, | 1 + 


n 


uae 


Yager provided an interesting view of the above mentioned process. 
Let us introduce a bivariate function H : [0,1]? — [0,1], called the importance 
transformation function, defined by 


H(w,t) = 9 (wg(t)). 
Then we can express (3.21) as 
Tw = T(A(w1,21),...,H (Wn, 2n)), 


provided that w; € [0,1], but not necessarily sum to one E. 
The function H satisfies the following properties: 


13 In fact, restrictions w; < 1 are not necessary, H can be defined on [0, 00] x [0, 1]. 
However, it looses its interpretation as an implication function, see footnote 


3.4 Triangular norms and conorms 179 


) = 9 (g(t) =t; 
)= g0) = 1; 
e H(w,t)is non-decreasing in t and is non-increasing in w. 


Note 3.100. The above mentioned properties are exactly the properties of some im- 
plication functions [4, notably S- and R-implications. Then we can have an alterna- 
tive definition of weighted t-norms, starting from an arbitrary S- or R- implication 
I, as 

Tw = T(I(wi,21),...,2(wn,2n)). 


Note 8.101. Yager suggests that the importance transformation function H 
may be defined using a different additive generator g from that of the t-norm T, i.e., 
H(w,t) = 9 (w9(t)). 


For weighted t-conorms we have an analogous expression 


Sy = S(H(w1, 21), ...,H(Wn,2n)), 


with function H(w,t) = hY (wh(t)). However, now h is an additive gen- 
erator of a t-conorm, and is increasing, hence function H has a differ- 
ent set of properties compared to H, making it inconsistent with implica- 
tion functions. Namely H is non-decreasing in both arguments and satis- 
fies H(0,t) = 0. By using a strong negation N one can represent it via an 
implication function as H(w,t) = N(I(w, N(t)). For S-implication it yields 
H(w,t) = N(S(N(w), N(t)) = T(w,t), the t-norm dual to S. 


Example 3.102. Let S be max, Sı be some t-conorm which generates an S- 
implication J, and w be a weighting vector. Using H(w,t) = N(I(w, N(t)) 
we obtain 


G= max{ Å (w, Tiles: (wn, £n)} = max{T; (w1, z1), ..., Ti (Wn, Tn)}, 


where Ti is the t-norm dual to S1. This is a well-known weighted max function 


[33,97 Bed. 


14 A bivariate function 7 : [0,1]? — [0,1] is called an implication function [og, 
if: 
I is non-decreasing in the first variable; 


I is non-increasing in the second variable; 
I(0,0) = (0,1) = I(1,1) = 1 and J(1,0) = 0. 








S-implications are defined by Is(x,y) = S(N (x), y), where S is a t-conorm and 
Nisa ae negation. An R-implication is defined by Ir(x,y) = sup{z € 
(0, 1]| T(z, z) where T is a left-continuous t-norm. There are also other 
implications, see = bd ra 
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Example 3.108. Let T = min, S$ be some t-conorm, and let w be a weighting 
vector. Using the same argument we obtain a weighted t-conorm 


Sy = S(A(w1, x1), x ., H(wn, £n)) = S$(min(wi,21),...,min(wp,2n)). 


Of course, we can also use any other t-norm instead of min. A further gener- 
alization is obtained by using 


Sw = S(A(w1,21),..., A(Wn, En)), 


where A : [0, 00[x[0, 1] — [0,1] is the transformation function defined by 





A(w, t) =sup {ye (0, 1]| Si, j € {12,045 < w and w € [0,1] 


j—times i— times 


such that S(u,...,u) < t and y = S(u,..., ù) 


A is a generalization of the usual multiplication, see for details. 


Other weighted aggregation functions based on t-norms and t-conorms are 
considered ia ai BA 


J 


Construction based on composition 


Let us now consider an alternative construction method, based on a com- 
position of an averaging aggregation function f, such as a weighted quasi- 
arithmetic mean or Choquet integral discussed in Chapter P| and a non- 
decreasing function w : [0,1] — [0,1], (0) = 0,%(1) = 1, using Proposition 
[L85] Under suitable conditions, namely w(f(1,...,1,t,1,...,1)) < t for all 
t € [0,1] and at any position, the composition yo f is a conjunctive ag- 
gregation function (which was not generally the case for weighted t-norms). 
Consider the following examples. 


Example 8.104. Let f be a quasi-arithmetic mean with a strictly decreasing 
generating function g, such that g(1) = 0 (we remind that generating func- 
tions are defined up to a linear transformation, see Section [2.3.2] so such 
choice is possible if 1 is not an absorbing element of f). Then condition 
w(fd,...,1,6,1,...,1)) < t entails 


b(g-*(g(t)/n)) < t. 


Take u(t) = g} (ng(t)) = g~(min(g(0), ng(t))). Then we have (0) = 
0,~(1) = 1 and 


b(9-*(g(t)/n)) = g~*(min(g(0), g(t) = t. 


Consequently, ~ o f is a conjunctive aggregation function with the neutral 
element e = 1. 
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Example 3.105. A specific instance of Example [B.104] is the case of power 
means 
#1, ifr<0, 
g(t) =< -(t" —1), ifr >0, 
—log(t), ifr=0. 


For r Æ 0 define w(t) = (max(0,nt” — (n — 1)))!/" and w(t) = t” for t = 0. 
The function wo f becomes 


max(0, 7 æ; — (n — 1))V", if r £0, 


(Yofa) =n 
[I t; ifr = 0, 
i=1 


which is the Schweizer-Sklar family of triangular norms, p. [50] On the other 
hand, if we take g(t) = (1—t)",r > 0, we can use = max(0, 1 — n!/" (1 — t)), 
and in this case we obtain 

(Yo f)(x) = max(0,1- (0 (1 — 24)")"”), 


i=1 
i.e., the Yager family of triangular norms, p. [156 


Example 3.106. Consider a weighted quasi-arithmetic mean with a strictly 
decreasing generating function g, such that g(1) = 0, and strictly positive 
weighting vector w. As in Example B.104] take W(t) = g(-)(—+~—g(t)) = 


min wi 


gi (min(g(0), —+—g(t))). Then we have (0) = 0, (1) = 1 and for all j = 


> min wi 
1 n 





gereg 





Vlg (wza(t))) = g7 (min(g(0), I) < t 
Consequently, 70 f is a conjunctive aggregation function for a given f. More- 
over, it is the strongest conjunctive function of the form wo f. It depends 
on the weighting vector w, but it differs from the weighted t-norms. Note 
that such a weighted conjunctive function can be written with the help of the 
importance transformation function H : [0,00] x [0,1] — [0,1] (see p. 778) as 


Te (x) = T(A(w1,21),..., H (Wn, tn)) 


with the modified vector w = (—4—,..., —“._). 


min w; ’ > min wi 





Example 3.107. As a specific instance, consider weighted arithmetic mean Mw 
with a strictly positive weighting vector w. Take y(t) = max(0, 4+—(4—-1)), 


with w* = min w;. Then the corresponding weighted conjunctive function is 
1 
(Y o f)(x) = max(0, =z (Mw(x) — 1 + w*)) 


In the same way weighted conjunctive aggregation functions can be con- 
structed from other averaging functions such as OWA and Choquet integrals 
treated in Sections [2.5] and [2.6] 
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3.5 Copulas and dual copulas 


The problem of the construction of distributions with given marginals can be 
reduced, thanks to Sklar’s theorem (228), to construction of a copula. There 
exist different methods for constructing copulas, presented, for instance, in 
{196}. We proceed with the definition of bivariate copulas. 


Definition 3.108 (Copula). A bivariate copula is a function C : [0,1]? > 
[0,1] which satisfies: 


e C(x,0) = C(0,x) = 0 and C(a,1) = C(1, x) = x for all x € [0,1] (bound- 
ary conditions); 

e C(r1,41)-C(x1, y2)—C(x2, yi) +O (x2, y2) 2 0, for all x1, 91,22, y2 € [0,1] 
such that xı < z2 and yı < y2 (2-increasing property). 


In statistical analysis, a joint distribution H of a pair of random vari- 
ables (X,Y) with marginals F and G respectively, can be expressed by 
H(x,y) = C(F(x),G(y)) for each (x,y) € [—00, 00]?, where C is a copula 
uniquely determined on Ran(F) x Ran(G). 


Example 3.109. 


e The product, Lukasiewicz and minimum t—norms are copulas. 
e The function Cu : [0,1]? — [0,1] defined for each u € [0, 1) as follows 


max(z +y— 1,u), (x,y) € [u, iF 
min(z, y) otherwise 


Cu(z, y) = { 


is also a copula. 


Main properties 


e Copulas are monotone non-decreasing (and thus are aggregation func- 
tions); 

Copulas verify Tg < C < min (and thus are conjunctive functions); 
Copulas are not necessarily symmetric or associative; 

Copulas satisfy the 1-Lipschitz property, and are hence continuous; 

A convex combination of copulas is a copula. 

A t-norm is a copula if and only if it is 1-Lipschitz. 


Note 8.110. Every associative copula is a continuous t-norm (14), p.204. But not 
every t-norm is an associative and symmetric copula (take Tp as a counterexample). 
Not every copula is a t-norm, for instance C(x,y) = ry + #7y(1— x)(1— y) is an 





asymmetric copula and therefore not a t-norm. 
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Definition 3.111. Let C be a copula. 


e The function C(x, y) : [0,1]? — [0,1] given by 


C(a,y) =x +y—C(ax,y) 
is called the dual of the copula C (it is not a copula); 
e The function C*(zx,y) : [0,1]? — [0,1] given by 
C* (x,y) =1-—C(1-—zx,1-— y) 


is called the co—copula of C (it is not a copula); 
e The function Ct(x,y) = C(y,x) is also a copula, called the transpose of 
C. A copula is symmetric if and only if Ct = C. 


Note 8.112. For historical reasons the term “dual copula” does not refer to the dual 
in the sense of Definition [1.54] of the dual aggregation function. Co-copula is the 
dual in the sense of Definition [1.54 


Note 3.113. The dual of an associative copula, C, is not necessarily associative. The 
dual copulas of Frank t-norms (p. (154) (which are copulas themselves), are also 
associative (but disjunctive) symmetric aggregation functions. 


If C is a bivariate copula, then the following functions are also copulas: 


Ci(x,y) = x — C(z,1 — y); 
Co(z,y) =y — C(1-— x,y); 
C3(z,y)=x£+y—-1+C(1-zx,1-— y); 


The concept of a quasi-copula (Definition B.8]on p. D25) is more general 
than that of a copula. This was introduced by Alsina et al. in 1983 [5]. Each 
copula is a quasi-copula but the converse is not always true. They verify Tg < 
Q < min for all quasi-copulas Q. Any copula can be uniquely determined by 
means of its diagonal (see [195}). 

Another related concept is semicopula (Definition 8.6), which includes the 
class of quasi-copulas and hence the class of copulas. 





Note 3.114. Note the following statements: 


e A semicopula C that satisfies the 2—increasing condition C(x1, y1) — C(#1, y2) — 
C(x2,y1) + C(x2,y2) > 0, for all x1, 41,22, y2 € [0,1] such that xı < x2 and 
yı < y2 is a copula. 

e A semicopula Q that satisfies the Lipschitz condition |Q(x1, y1) — Q(x2, y2)| > 
|v1 — x2| + |y1 — y2|, is a quasi—copula. 

e A semicopula T that is both symmetric and associative is a t-norm. 
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Example 3.115. The drastic product is a semicopula but not a quasi—copula. 
The function S(x,y) = xymax(z,y) is a semicopula but is not a t-norm 
because it is not associative. However, the function 


_f0, if (wy) € [0,1/2] x [0,11 
Day= ‘ane y) otherwise 
is not a quasi-copula (not Lipschitz), but is a semicopula, since it has neutral 
element e = 1. 


Archimedean copulas 


An important class of copulas are the Archimedean copulas. They are useful 
for several reasons: a) they can be constructed easily; b) a wide variety of 
copulas belong to this class; c) copulas in this class have special properties. 
Archimedean copulas appeared, for the first time, in the study of probabilistic 
metric spaces. 

Archimedean copulas are characterized by convex additive generators E 


Proposition 3.116. Let g : [0,1] — [0,00] be a continuous strictly decreasing 
function with g(1) = 0, and let g ® be the pseudo-inverse of g. The function 
C : [0,1]? — [0,1] given by 


C(z,y) = g5” (g(a) + g(y)) (3.23) 
is a copula if and only if g is convex. 


This result gives us a way to construct copulas, for that we only need to 
find continuous, strictly decreasing and convex functions g from [0, 1] to [0, oo], 
with g(1) = 0, and then to define the copula by C(x, y) = g} (g(a) + g(y)). 
For instance if g(t) = + — 1 we have C(zx,y) = ao For short we will 
denote this copula by C = si. 


Definition 3.117. (Archimedean copula) A copula given by (3.23), with 
g : [0,1] — [0, co] being a continuous strictly decreasing function with g(1) = 0, 
is called Archimedean. The function g is its additive generator. If g(0) = oo, 
we say that g is a strict generator, then g—)) = g7! and C(x, y) = g7! (g(x) + 
g(y)) is called a strict Archimedean copula. If g(0) < œ, then C is a nilpotent 
copula. If the function g is an additive generator of C, then the function 
O) =e-9, is a multiplicative generator of C. 


Note 8.118. Archimedean copulas are a subclass of continuous Archimedean t-norms 
(those with a convex additive generator). 


A function g is convex if and only if g(at; + (1 — a)t2) < ag(ti) + (1 — a)g(t2) 
for all t1,t2 E€ Dom(g) and a € [0, 1]. 
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Example 8.119. 


1. C(a,y) = xy is a strict Archimedean copula with an additive generator 
g(t) = — logt. 

2. C(a,y) = max(a+y—1,0) is also an Archimedean copula with an additive 

generator g(t) = 1-—t. 

C(a,y) = min(z, y) is not an Archimedean copula. 

4. O(a, y) = rye~@'°S* 84 is an Archimedean copula for a € (0, 1], with an 
additive generator g(t) = log(1 — alogt), called Gumbel-Barnett copula 
[196]. When a = 0, C (x,y) = xy. 


a 


Note 8.120. Among many properties of Archimedean copulas C, with additive gen- 
erators g, we will quote the following" 
C is symmetric; 
C is associative; 
If c > 0 is a real number then c- g is also an additive generator of the same 
copula. 


e Archimedean property: for any u,v €]0, 1[, there is a positive integer n such that 
n—times 


Oui u) << 


We summarize different families of parameterized copulas in Table[3.3] The 
limiting cases of these copulas are presented in Table[3.4] For a comprehensive 
overview of copulas see 


Table 3.3. Some parameterized families of copulas. 


1—A(1-t) 


= ST t 
e} ((= log 2)*+(— log y)ò) 3 ) 


~**~1)(e*¥—-1) 
=A—1 





The family of copulas #1 is called Clayton family, the family #3 is called 
Ali-Mikhail-Haq family (they belong to Hamacher family of t-norms), the 
family #4 is called Gumbel-Hougaard family (they belong to Aczél-Alsina 
family of t-norms) and family #5 is known as Frank family (the same as the 
family of Frank t-norms). 

Among the parameterized families of t-norms, mentioned in Table [B.I] the 
following are copulas: all Frank t-norms, Schweizer—Sklar t-norms with A € 


16 These properties are due to the fact that Archimedean copulas are continuous 
Archimedean t-norms. 
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[—oo, 1], Hamacher t-norms with A € [0,2], Yager t-norms with A € [1, oo], 
Dombi t-norms with A € [1,00], Sugeno—Weber t-norms with A € [0, oo, 
Aczél-Alsina t-norms with A € [1,00] and all Mayor—Torrens t-norms. 


Table 3.4. Limiting and special cases of copulas in Table [3.3] 


Limiting and special cases of copulas 


C4=Tr, Co=Tp; Cr= zh: Cox = min 
Oir= Tr; Cs = min 


Co = Tp, Ci= 


U 
Y= Il 
Ci =Tp, Co =min 
Ces = TL, Co = Tp, Css = min 





3.6 Other conjunctive and disjunctive functions 


By no means t-norms and t-conorms (or their weighted counterparts) are the 
only conjunctive and disjunctive aggregation functions. We already gave the 
definition of copulas (Definition [3.108) and semi- and quasi-copulas (Defini- 
tions B.6] and [B-8), that are conjunctive aggregation functions with neutral 
element e = 1. In this section we discuss how to construct many parameter- 
ized families of aggregation functions (of any dimension) based on a given 
semicopula, a monotone non-decreasing univariate function g : [0,1] — [0,1] 
and a pseudo-disjunction, defined below. This construction was proposed in 
ag, and we will closely follow this paper. We will only specify the results for 
conjunctive aggregation functions, as similar results for disjunctive functions 
are obtained by duality. 


Definition 3.121 (Generating function). A monotone non-decreasing func- 
tion g : [0,1] — [0,1], g(1) = 1 will be called a generating function. 


Definition 3.122 (Pseudo-disjunction). A monotone non-decreasing func- 
tion s : [0,1]” — [0,1], with absorbing element a = 1 is called a pseudo- 
disjunction. 


Note 3.123. A pseudo-disjunction is not always an aggregation function, as it may 
fail condition s(0) = 0. For example, s(x) = 1 for all x € [0,1]” is a pseudo- 
disjunction. 
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Proposition 3.124. fg) Let hı and hg be a bivariate and n-variate semi- 
copulas respectively and let s : [0,1]" — [0,1] be a pseudo-disjunction. The 
functions 


fix) = hi(ha(x), s(x)), 
fa(x) = hi(s(x), ha(x)) 


are also semicopulas. 


Example 3.125. Let us take the basic t-norms Tz, min and Tp as semicopulas, 
and their dual t—-conorms as pseudo-disjunctions. Then we obtain the following 
extended aggregation functions 








fi(x) = min(x)S7,(x), (hy = Tp, ho = min, s = Sz); 

fo(x) = Tp(x) max(x), (hy = h2 = Tp, s = max); 

fsx) = Tp (x) S(x), (hı = h2 =Tp, 8 = St); 

fa(x) = max{min(x) + SL (x) — 1,0}, (hi = Tz, ha = min, s = Sz). 


These aggregation functions are defined for any n (hence we call them ex- 
tended aggregation functions), but they are not associative. 


Let us take n generating functions gj, 92,- . -, gn. It is clear that if s is a dis- 
junctive aggregation function, then so is §(x) = s(gı (£1), g2(£2), ---, gn(£n)). 
Then we have 


Corollary 3.126. Let hı and hz be a bivariate and n-variate semicopu- 
las respectively, s be a disjunctive aggregation function, gi(t),i = 1,...,n 
be generating functions, and let 5 : [0,1]” — [0,1] be defined as §(x) = 
s(gı(£1), g2(£2),.--, g9n(£n)). The functions 


f(x) = hı (h2 (x), §(x)) 
fo(x) = hi(8(x), h2(x)) 


are semicopulas. 


By using parameterized families of pseudo-disjunctions and t-norms, we 
obtain large classes of asymmetric non-associative semicopulas, exemplified 
below for n = 2. 


Example 8.127. 


_ J1l,ift >p, 
I(t) = k otherwise, 


for p € [0,1]. Let ho = min, hı be an arbitrary semicopula, and s an arbitrary 
disjunctive aggregation function. Then 


fian) =mi h lha = { nr OPS a S 
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Fig. 3.19. 3D plots of semicopulas in Example B.127] with p = 0.4, q = 0.8 (left) 
and p = 0.7, q = 0.5 (right). 


Example 3.128. Let gp(t) = max(1 — p(1 — t),0), ho = min, hı = Tp and 
s = max. Then 





f(y) = hı (min(z, y), s(gp(£), ga(y))) = min(x, y)-max{1—p(1—«),1—q(1—y), 0} 
Example 3.129. Let gp(t) = t”, hy = he = min, and s = max. Then 
f(x,y) = hı(min(z, y), 8(9p(x), ga(y))) = min{min(z, y), max(z?, y*)}. 
e If hı =Tp we obtain 
f(z, y) = min(z, y) : max(z”, y). 


For p = q this simplifies to 


_ Jy, it r>y, 
f(a, y) = ee otherwise. 


e Ifs= Szr we obtain f(x,y) = min(z, y) - min(1, £? + y9), 
© Ifs=Sp we get f(x,y) = min(x,y)(a? +49 — 2 y9), 
Example 3.130. Using hı = h2 = Tp, gp(t) = t? and various s we get 


f(z, y) = zymax(z?, y"), 
f(x,y) = xy(z? + y1 — Py’), 
f(z, y) = zy min(1, 2? + y*). 
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5 and 


min{min(2, y), max(z”,y1)} (left), and p = 5, q = 3 and f(a,y) = 


Fig. 3.20. 3D plots of semicopulas in Example B.I29] with p = 3, q 


f(x,y) 


min(x, y) + max(x?, y1) (right). 





2 and 


f(x,y) = min(a,y) + min(1,2? + y1) (left), and with p = 2, q = 8 and f(x,y) 


Fig. 3.21. 3D plots of semicopulas in Example 8.1295] with p = 1, q 
min(a, y)(x? +y? — x”yt) (right). 


based on equa- 


Example 3.131. Several other semicopulas are proposed in 


tion 


for instance 


— zy)”. 


min(x, y)(% +y 


vy 


( 


Extension of the above examples to the n-variate case is straightforward. 


1 
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Fig. 3.22. 3D plots of semicopulas in ExampleB.130] with p = 2, q = 8 and f(x,y) = 
xy max(z?, y1) (left), and with p = 3, q = 5 and f(x,y) = xy min(1, 2? +y") (right). 


3.7 Noble reinforcement 


In this section we concentrate on disjunctive aggregation functions with some 
special properties, because of the application they come from. By duality, 
equivalent results are easily obtained for conjunctive functions. 

As we know, disjunctive aggregation results in mutual reinforcement of in- 
puts. However, in some cases such a reinforcement has to be limited, and the 
standard aggregation functions we considered so far (notably t-conorms) are 
not suitable. Consider aggregation of inputs in recommender systems, which 
are frequently used in e-commerce. Recommender systems recommend cus- 
tomers various products based on information collected from customers (their 
explicit preferences or preferences deducted from their purchase history). A 
number of available products (alternatives) are ranked by using several criteria 
(e.g., whether a customer likes mystery movies, comedies, etc.). This way one 
has a vector of inputs x, in which each component x; € [0,1] denotes the de- 
gree of satisfaction of the i-th criterion. Such degrees are called justifications. 
Each justification by itself is sufficient to recommend a product, but more 
than one justification provides a stronger recommendation. The products are 
shortlisted and displayed in the order of the degree of recommendation. 

In our terminology, we have an aggregation function f which combines 
justifications, with the properties: 


e f is continuous; 
e f has neutral element e = 0 (and thus f is disjunctive); 
e fis symmetric. 


We need f to be defined for any number of arguments, but we do not re- 
quire associativity. It appears that any triangular conorm S$ will be suitable 
as an aggregation function. However, when we evaluate S(x) for some typical 
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vectors of inputs, we see that t-conorms have an undesirable tail effect: aggre- 
gation of several small inputs results in a very strong recommendation. For 
example, take the Lukasiewicz t-conorm, the inputs x = (0.1,0.1,...,0.1) 
and n = 10. 


10 
S1(x) = min{1, 5) 0.1} =1, 
i=1 


i.e., we obtain the strongest recommendation. Similarly, take x = (0.3,...,0.3) 
and the probabilistic sum t-conorm 


10 
Sp(x) = 1 — | [0 - 0.3) = 0.97, 


i=l 


also a very strong recommendation. In fact, for all parametric families of 
t-conorms we have considered so far (except the maximum t-conorm), we 
have a similar effect: several weak justifications reinforce each other to give 
a very strong recommendation. In the context of recommender systems it is 
undesirable: it is not intuitive to recommend a product with several very weak 
justifications, and moreover, rank it higher than a product with just one or two 
strong justifications. A customer is more likely to purchase a product which 
strongly matches one-two criteria than a product which does not match well 
any of the criteria. 

In contrast, the maximum t-conorm does not provide any reinforcement: 
the value of the recommendation is that of the strongest justification only. 
This is also undesirable. 

R. Yager proposed the concept of noble reinforcement, where only 
strong inputs reinforce each other, while weak inputs do not. In what fol- 
lows, we define the thresholds for weak and strong justifications (which are in 
fact labels of fuzzy sets), and then discuss prototypical situations, in which 
reinforcement of inputs needs to be limited. These include the number of jus- 
tifications, the number of independent justifications, the number of strong and 
weak justifications and also various combinations. 

In its simplest form, the noble reinforcement requirement can be defined 
verbally as follows. 

Noble reinforcement requirement 1 “If some justifications highly rec- 
ommend an object, without completely recommending it, we desire to allow 
these strongly supporting scores to reinforce each other.” 

To express this requirement mathematically, we will first define a crisp 
threshold a to characterize high values. Let a € [0,1], and the interval [a, 1] 
be the set of high input values. Later we will fuzzify this interval by using TSK 
methodology bai, but at the moment we concentrate on crisp intervals. Let 
also E denote a subset of indices £ C {1,...,n} and € denote its complement. 


Definition 3.132. An extended aggregation function F has a noble rein- 
forcement property with respect to a crisp threshold a € [0,1] if it can be 
expressed as 
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Aien(x), if 3E C {1,... n} Vi EE: a 2a 
Fy(x) = and Vi € E iti <a, (3.24) 


max(x), otherwise, 


where Aice (x) is a disjunctive extended aggregation function (i.e., greater than 
or equal to marimum), applied only to the components of x with the indices 


in E. 


That is, only the components of x greater than a are reinforced. This defi- 
nition immediately implies that no continuous Archimedean t-conorm has the 
noble reinforcement property (this follows from Proposition B.35). Therefore 
we shall use the ordinal sum construction. 


Proposition 3.133. fe) Let S be a t-conorm and let a € [0,1]. Define a 
triangular conorm by means of an ordinal sum (< a,1,S >), or explicitly, 





= a+(1—a)S(S 2, So"), if 21,22 >a, 
Sa (z1, z2) aa { max(21, x2), otherwise, (3.25) 
where S A maz; Sa(z£1,..., £n) (defined by using associativity for any dimen- 


sion n) has the noble reinforcement property. 


Note 8.134. A special case of the t-conorm in Proposition [3.133] is the dual of a 
Dubois-Prade t-norm (see (149) [284]), expressed as (< 0, a, Tp >). 


Proposition[8.133]gives a generic solution to the noble reinforcement prob- 
lem, defined (through associativity) for any number of arguments. Next we 
discuss refinements of the noble reinforcement requirement, which involve not 
only the threshold for high values, but also the minimal number of inputs to 
be reinforced, their independence, and also the presence of low input values, 
2d, zd. In many systems, notably the recommender systems, some criteria 
may not be independent, e.g., when various justifications measure essentially 
the same concept. It is clear that mutual reinforcement of correlated criteria 
should be smaller than reinforcement of truly independent criteria. 

First we specify the correspondent requirements in words and then give 
their mathematical definitions. 


Requirement 2 Provide reinforcement if at least k arguments are high. 


Requirement 3 Provide reinforcement of at least k high scores, if at least 
m of these scores are very high. 


Requirement 4 Provide reinforcement of at least k high scores, if we 
have at most m low scores. 


Requirement 5 Provide reinforcement of at least k > 1 independent high 
scores. 


All these requirements are formulated using fuzzy sets of high, very high 
and low scores, and also a fuzzy set of independent high scores. We shall 
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formulate the problem first using crisp sets, and then fuzzify them using TSK 
(Takagi-Sugeno-Kang) methodology i 

Define three crisp thresholds a, 8, y, y < a < p; the interval [a, 1] will 
denote high scores and the interval [3,1] will denote very high scores, and the 
interval [0, y] will denote low scores. 

Translating the above requirements into mathematical terms, we obtain: 


Definition 3.135. An extended aggregation function Fy, provides noble re- 
inforcement of at least k arguments with respect to a crisp threshold a € [0,1], 
if it can be expressed as 





Ajee(x), if 3E C {1,...,n}| |E] > k, 
Vi € E: ti 2 Q, 
Fa,ș(x) = and Yi € Ê : ti <a, ee) 


max(x), otherwise, 


where Ajce(x) is a disjunctive extended aggregation function, applied to the 
components of x with the indices in E. 


Definition 3.136. An extended aggregation function Fa, 8,k,m provides noble 
reinforcement of at least k high values, with at least m very high values, with 
respect to thresholds a, € [0,1], a < 8 if it can be expressed as 


Ajee(x), if IJE C {1,...,n}] |E| > k, 
Vie E: a, >a,Vie E:2; <a, 
Fie tial k= and ID C E| |D| = m, (3.27) 
vie D: tm >p, 
max(x), otherwise, 








where Aice(x) is any disjunctive extended aggregation function, applied only 
to the components of x with the indices in E. 


Definition 3.137. An extended aggregation function Fay, k,m provides noble 
reinforcement of at least k high values, with at most m low values, with respect 
to thresholds a, y € [0,1], y < a if it can be expressed as 





Ajee(x), if IJE C {1,... n} |E| > k, 
ViEE: x, >a,ViEeE: a <a, 
Fokal X) = and AD C {1,...,n}| (3.28) 
[Dl] =n-—m,VieD:a;>7, 
max(x), otherwise, 





where Aice(x) is any disjunctive extended aggregation function, applied only 
to the components of x with the indices in E. 
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That is, we desire to have reinforcement when the scores are high or 
medium, and explicitly prohibit reinforcement if some of the scores are low. 
In the above, 1 < k < n and 0 < m < n—k; when m = 0 we prohibit 
reinforcement when at least one low score is present. 


Construction of the extended aggregation functions using Definitions 
[3.135]3.137]seems to be straightforward (we only need to choose an appro- 
priate disjunctive function Aice), however at closer examination, it leads to 
discontinuous functions. For applications we would like to have continuity, 
moreover, even Lipschitz continuity, as it would guarantee stability of the 
outputs with respect to input inaccuracies. A general method of such con- 
struction is presented in Chapter [6] and it involves methods of monotone 
Lipschitz interpolation. We will postpone the technical discussion till Chap- 
ter [6] and at the moment assume that suitable extended aggregation functions 
Faki ap km ankm are given. 


Fuzzification 


Next we fuzzify the intervals of high, very high and low values. Define a fuzzy 
set high by means of a membership function up (t). Note that un (t) is a strictly 
increasing bijection of [0,1]. Similarly, we define the fuzzy set very high by 
means of a membership function juy;,(t), also a strictly increasing bijection of 
(0, 1], and the fuzzy set low by means of a strictly decreasing bijection y(t). 
Let us also define a membership function (7) to denote membership values 
in the fuzzy set the minimal number of components. The higher the number 
of high components of x, starting from some minimal number, the stronger 
reinforcement. Let us also denote min(€) = min Xi 


Yager calculates the value of the extended aggregation function with noble 
reinforcement (based on Definition [3.132) using 


F(x) = max{un(min(E)) Aree (x) + (1 — un (min(E))) max(x)} 

= max{max(x) + pn (min(£)) (Aice (x) — max(x))}. (3.29) 
Similarly, based on Definition B.135] we obtain 

max{max(x) + pn (min(E))jt9(k)(Fa,a() ~ max(x))}, (3-30) 

x) 


is computed for fixed k and €. Based on Definition B-136] we 


F(x) = 


where Fox ( 
obtain 
F(x) = max{max(x) + #a(min(€)) on min(D))(Fa,,b,m (x) — max(x)) f. 
l (3.31) 
Based on Definition [3.137] we obtain 
F(x) = max{ max(3x) + pa(min(€))(1 — m (min(D)))(Fa,y,k,m (x) — max(x)) }. 
(3.32) 


3.7 Noble reinforcement 195 


As far as the requirement of independent scores is concerned, we need to 
define a function which measures the degree of independence of the criteria. 
We shall use a monotone non-increasing function defined on the subsets of 
criteria ur : 2M — [0,1], (WV = {1,...,n}) such that the value uz (E) represents 
the degree of mutual independence of the criteria in the set E C {1,2,...,n}. 
Note that having a larger subset does not increase the degree of independence 
ACB => p1(A) > ur(B). We assume that the function ur(E) is given. 


Example 3.188. Consider a simplified recommender system for online movie 
sales, which recommends movies based on the following criteria. The user likes 
movies: 1) mystery, 2) detectives, 3) drama, and 4) science fiction. One could 
use the following membership function uz(E) 


pr({i}) = 1,i=1,...,4; 
pr({1,2}) = wr({1, 2, 3}) = pr({1, 2, 4}) = 0.7; 
er ({1, 3}) = wr ({1, 4}) = wr ({2, 3}) = er ({2, 4}) = 1; 
ur ({3, 4}) = wr({2, 3, 4}) = wr ({1, 3, 4}) = ur({1, 2, 3, 4}) = 0.5. 


We obtain an aggregation function which satisfies Requirement 5 by taking 
the maximum over all possible subsets of criteria € in which reinforcement 
takes place, 


F(x) = max (max(x)+ a(min(€))q(k)ur(E)(Aice (x) — max(x))), (3.33) 


where Ajce is computed as in (8.24) but only with a fixed subset E. 


Note 8.139. Note that the maximum over all € is essential: because the function pr 
is non-increasing, the function F(x) may fail to be monotone. For example, con- 
sider the problem with three linearly dependent, but pairwise independent criteria, 
u({1,2}) = w({2,3}) = w({1,3}) = 1, w({1, 2,3}) = 0. Now if all components of x 
are high we have no reinforcement. But if two components are high, and the remain- 
ing is small, we have reinforcement. We ensure monotonicity by taking maximum 
reinforcement over all possible combinations of the criteria. 
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4 
Mixed Functions 


4.1 Semantics 


Mixed aggregation functions are those whose behavior depends on the inputs. 
These functions exhibit conjunctive, disjunctive or averaging behavior on dif- 
ferent parts of their domain. We have the following general definition. 


Definition 4.1 (Mixed aggregation). An aggregation function is mixed if 
it is neither conjunctive, nor disjunctive or averaging, i.e., it exhibits different 
types of behavior on different parts of the domain. 


Note 4.2. An immediate consequence of the above definition is that mixed aggrega- 
tion functions are not comparable with min and/or are not comparable with max. 

The main use of mixed aggregation functions is in those situations where 
some inputs positively reinforce each other, while other inputs have negative 
or no reinforcement. This is strongly related to bipolar aggregation, in which 
some inputs are considered as “positive” and others as “negative” evidence, see 
Section [.5] For example, in expert systems such as MYCIN and PROSPEC- 
TOR Bs, [sa], certain pieces of evidence confirm a hypothesis, whereas others 
disconfirm it (these are called certainty factors). This is modeled by positive 
and negative inputs on the scale [—1,1], with 0 being a “neutral” value. We 
know, however, that any bounded scale can be transformed into [0,1] (see 
p. BIJ, therefore we shall use the inputs from the unit interval, with 4 being 
the “neutral” value, and interpret the inputs smaller than 4 as “negative” 
evidence and those larger than 4 as “positive” evidence. 


Example 4.3. Consider the following rule system 
Symptom A confirms diagnosis D (with certainty a); 
Symptom B disconfirms diagnosis D (with certainty 8); 
Symptom C confirms diagnosis D (with certainty y); 
etc. 
The inputs are: A (with certainty a), B (with certainty b), C (with cer- 
tainty c). 
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The certainty of the diagnosis D on [—1,1] scale is calculated using 
f(g(a, a), —g(b, 8), g(c,7)), where f is conjunctive for negative inputs, dis- 
junctive for positive inputs and averaging elsewhere, and g is a different con- 
junctive aggregation function (e.g., g = min). 


A different situation in which the aggregation function needs to be of mixed 
type is when modeling heterogeneous rules like 


If ty 1s Ay AND (t2 1s Ag OR t3 1s A3) THEN ... 


and £1, 2%2,... denote the degrees of satisfaction of the rule predicates tı is A1, 
t2 is Ag, etc. Here we want to aggregate the inputs 71, 72,... using a single 
function f(x), but the rule is a mixture of conjunction and disjunction. 


Consider two specific examples of mixed type aggregation functions. 


Example 4.4. MYCIN is the name of a famous expert system which was 
one of the first systems capable of reasoning under uncertainty. Certainty 
factors, represented by numbers on the bipolar scale [—1,1], were combined 
by means of the function 


xty—sxy, if min(z,y)>0, 
f(z,y) = amo if min(z,y) < 0 < max(z, y), (4.1) 


xz+y+zy, if max(z,y) <0. 





This function is symmetric and associative (recall that the latter implies that 
it is defined uniquely for any number of arguments), but does not define 
outputs at (—1,1) and (1, —1). It is understood though that the output is -1 
in these cases. 

On [0,1] scale it is given as 


2(x + y-— zy)- 1, if min(z, y) > 4, 
f(2,¥) = 4 Tommie m + 2 if min(z,y) < 3 <max(z,y), (4.2) 
Quy, if max(z,y) < 3. 


Example 4.5. PROSPECTOR was another pioneering expert system for min- 
eral exploration (33). PROSPECTOR’s aggregation function on the scale 
[—1, 1] is defined as 


£+ Y 
=p, 4.3 
P@Y) = Ty (4.3) 
It is symmetric and associative. It is understood that f(—1,1) = —1. 
On [0,1] scale it is given as 
gry 
Hey) = (4.4) 


xy + (1— x)(1— y) 
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In both mentioned examples, the aggregation functions exhibited conjunc- 
tive behavior on [0, $]” and disjunctive behavior on [4, 1]”. On the rest of the 
domain the behavior was averaging. This is not the only way to partition the 
domain into disjunctive, conjunctive and averaging parts, as we shall see later 
in this Chapter. 

The class of mixed aggregation functions includes many different families. 
Some of them, such as uninorms, nullnorms, compensatory T-S functions 
and ST-OWAs, are related — in some sense which will be detailed later — to 
triangular norms and conorms. Other important families of mixed functions 
are the symmetric sums and some kinds of generated functions. 

Uninorms and nullnorms are two popular families of associative aggre- 
gation functions with clearly defined behavior (see Figure ÆI] for the two- 
dimensional case): 
































y y 
averaging disjunctive averaging conjunctive 
e a 
conjunctive averaging disjunctive averaging 
a e x 0 a x 


Fig. 4.1. Behavior of uninorms (left) and nullnorms (right). 


e Uninorms are associative aggregation functions that present conjunctive 
behavior when dealing with low input values (those below a given value 
e which is, in addition, the neutral element), have disjunctive behavior 
for high values (those above e) and are averaging otherwise (i.e., when 
receiving a mixture of low and high inputs). 

e On the other hand, nullnorms are associative aggregation functions that 
are disjunctive for low values (those below a given value a which is, in 
addition, the absorbing element), conjunctive for high values (those above 
a) and are otherwise averaging (actually, in these cases they provide as 
the output the absorbing element a). 


The aggregation functions in Examples and turn out to be uni- 
norms, as we shall see in the next section. Of course, not only uninorms and 
nullnorms exhibit the behavior illustrated on Fig. ÆI} some generated func- 
tions discussed in Section 4]and some symmetric sums behave as uninorms 
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(but they are not associative). However, uninorms and nullnorms are the only 
types of associative functions with the behavior on Fig. so we start with 
them. 


4.2 Uninorms 


Uninormd]| were introduced as a generalization of t-norms and t-conorms 
based on the observation that these two classes of aggregation functions (see 
Chapter B) are defined by means of the same three axioms — associativity, 
symmetry and possession of a neutral element — just differing in the value 
of the latter, which is 1 for t-norms and 0 for t-conorms. This observation 
leads to studies of associative and symmetric aggregation functions that have 
a neutral element which may take any value between the two end points of 
the unit interval. 

Since their introduction, uninorms have proved to be useful for practical 
purposes in different situations, and have been applied in different fielad2]. 


4.2.1 Definition 
Uninorms are defined in the bivariate case as follows: 


Definition 4.6 (Uninorm). A uninorm is a bivariate aggregation function 
U : [0,1]? — [0,1] which is associative, symmetric and has a neutral element 
e belonging to the open interval ]0, 1|. 


Note 4.7. An alternative definition allows the neutral element e to range over the 
whole interval [0,1], and thus includes t-norms and t-conorms as special limiting 
cases. 


Note 4.8. Since uninorms are associative, they are extended in a unique way to func- 
tions with any number of arguments. Thus uninorms constitute a class of extended 
aggregation functions (see Definition [L.6]on p. Ø. 


4.2.2 Main properties 


The general behavior of uninorms is depicted in Figure[4.1] which shows that 
a uninorm with neutral element e is conjunctive in the square [0,e]? and 
disjunctive in the square |e, 1]?. More precisely, uninorms act as t-norms in 
[0, e]? and as t-conorms in [e, 1]?, that is, any uninorm with neutral element 
e is associated with a t-norm Ty and a t-conorm Sy such that: 


1 Uninorms appeared under this name in 1996 [278], but a special class of them 
— nowadays known as the class of representable uninorms — was first studied in 
1982 [7]. 

2 For example, expert systems l3, or fuzzy systems modeling [269, b7a. 
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Ve.) € [0e], Uly) =e-Ty (5.2), 


venele — U(ey)=e+ Ue) Sy (F=* ES). 





The functions Ty and Sy are usually referred to as the underlying t-norm 
and t-conorm related to the uninorm U. 

On the remaining parts of the unit square, uninorms have averaging be- 
havior, i.e: 


V(x, y) € (0, e] x [e, 1] U [e, 1] x [0, e], min(z,y) < U(a,y) < max(z, y). 


Note that contrary to what happens in the squares [0,e]? and [e,1]?, the 
behavior of uninorms in the rest of the unit square is not tied to any specific 
class of (averaging) functions. 

The structure of uninorms that has just been described is summarized in 
Figure using the following notation: 


Tle) = e: Tu (=,2), 




















E € 
=e y-e 
* 
Sý(z,y) =e + (1 — e) - Su T 
e l-e 
y 
averaging disjunctive 
i led 
min<U<max (eoim st) 
e 
conjunctive averaging 
Sorea ré) nun<U<max 


Fig. 4.2. Structure of a uninorm with neutral element e, underlying t-norm Ty and 
underlying t-conorm Sy. 


Other interesting properties of uninorms are the following: 


Absorbing element For any uninorm U(0,1) € {0,1}. This allows one to 
classify uninorms into two different categories: 
e Conjunctive uninorms are uninorms which verify U(0,1) = 0. These 
uninorms have absorbing element a = 0 (due to monotonicity). 
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e Disjunctive uninorms are uninorms which verify U(0,1) = 1. These 
uninorms have absorbing element a = 1 (due to monotonicity). 
Duality The class of uninorms is closed under duality, that is, the dual of 
any uninorm U with respect to an arbitrary strong negation N, defined 

as 
Ua(z,y) = N(U(N (z), N(y))), 


is also a uninorm, and it has the following properties: 

e IfU has neutral element e, Uq has neutral element N(e). 

e If U is a conjunctive (respectively disjunctive) uninorm, then Uy is a 
disjunctive (resp. conjunctive) uninorm. 

Clearly, no uninorm can be self-dual, since at least the values U (0,1) and 

Ua(0, 1) = N(U(1,0)) = N(U(0,1)) will be different. 

Continuity Uninorms are never continuous on the whole unit square. Nev- 
ertheless, it is possible to find uninorms that are continuous on the open 
square ]0,1[?. Moreover, there are uninorms which are almost continu- 
ous, i.e., which are continuous everywhere except at the corners (0,1) and 
(1,0). These are called representable uninorms, see Section [4.2.3 

Idempotency Recall from Chapter] that the only idempotent t-norms and 
t-conorms are minimum and maximum respectively. In the case of uni- 
norms, there are different kinds of idempotent function] that have been 
found and characterized (64) Ezo. 

Strict monotonicity The existence of an absorbing element prevents uni- 
norms from being strictly monotone in the whole unit square. Notwith- 
standing, some of them (such as the already mentioned representable uni- 
norms) are strictly monotone on the open square ]0, 1[?. 


4.2.3 Main classes of uninorms 


There are several classes of uninorms that have been identified and character- 
ized. Two of the most important and useful ones are described below. 


The families Umin and Umax 


In Section [4.2.2] we saw that in the region [0, e] x [e, 1] U fe, 1] x [0, e] uninorms 
have averaging behavior. This raises the question of whether it is possible to 
have uninorms acting on this region exactly as the limiting averaging functions 
min and max. The answer is affirmative; this idea provides two important 
families of conjunctive and disjunctive uninorms, known, respectively, as Umin 
and Umax (see figure [4.3). 


Proposition 4.9 (Uninorms Up in and Umax). 


3 Of course, because of monotonicity (see p. D) such idempotent uninorms are, 
actually, averaging functions instead of mixed ones. 
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Fig. 4.3. Structure of the families of uninorms Umin (left) and Umax (right) as 
defined on p. [202] 


e Let T be an arbitrary t-norm, S be an arbitrary t-conorm and e €]0,1[. 
The function 








eTA); if (x,y) € [0, e]?, 
U mints y) = e+ (1 = e) “8 (=, E) , if (x, y) € le, i? 
min(2, y) otherwise 


is a conjunctive uninorm with the neutral element e, and the family of all 
such uninorms is denoted by Umin. 

e Let T be an arbitrary t-norm, S be an arbitrary t-conorm and e €]0,1[. 
The function 





a Cede if (x,y) € [0, e]?, 
Umax(T,S,e) (£, y) = e+(1-—e) 5 (3, 1) , if (x,y) € [e,1]?, 
max(a, y) otherwise 


is a disjunctive uninorm with the neutral element e, and the family of all 
such uninorms is denoted by Umax- 


Observe that uninorms in the two mentioned families satisfy the following 
properties: 
e If U € Unin then the section Uj, given by t + U(1,t), is continuous on 
[0, ef. 
e IfU € Umax then the section Up, given by t +> U(0,t), is continuous on 
Je, 1]. 
Note 4.10. The associativity allows one to determine uniquely the n-ary extensions 
of the families Umin and Umax, which are the following: 


eT (Sh 22524), if (£1,..., £n) € [0,e]”, 


Umin(T,8,e) (1, aes Ln) = e+ (1 ~ e) 8 (f=, EARD zeze) , if (x1, Bas Ln) € le, 1"; 
min(%1,...,2n) otherwise, 
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ae eee if (v1,..-,%n) € [0,e]”, 
Unset = l e+ (1-6): S (x=, a | if (@1,...,an) € fe, lJ”, 
max(21,...,2n) otherwise. 


Representable uninorms 


We saw in Chapter B] that there are t-norms and t-conorms — in particular, 
the continuous Archimedean ones, — that may be represented in terms of some 
specific single-variable functions known as their additive (or multiplicative) 
generators. A similar property exists for uninorms, i.e., there is a class of 
uninorms (including both conjunctive and disjunctive ones), usually known as 
representable uninormd4, that can be built by means of univariate generators: 


Definition 4.11 (Representable uninorms). Let u : [0,1] — [—co, +00] 
be a strictly increasing bijection] such that u(e) = 0 for some e €]0, 1|. 


e The function given by 


U(x, y) = o +u(y)), if (x,y) € [0,1]2\{(0, 1), (1,0)}, 


0 otherwise 


is a conjunctive uninorm with the neutral element e, known as a conjunc- 
tive representable uninorm. 
e The function given by 


U(x, y) = ~ +u(y)), if (x,y) € [0, 1]?\{(0, 1), (1,0)}, 


1 otherwise 


is a disjunctive uninorm with the neutral element e, known as a disjunctive 
representable uninorm. 


The function u is called an additive generator a” uninorm U and it is 
determined up to a positive multiplicative constan. 


Observe that each u provides two different uninorms, a conjunctive one 
and a disjunctive one, that differ only in the corners (0,1) and (1,0). 


Note 4.12. Similarly to the case of t-norms and t-conorms, representable uninorms 
may also be described by means of multiplicative generator{]. 


4 Also known as generated uninorms, compare with Section 4] 

5 Note that such a function will verify u(0) = —oo and u(1) = +00. 

° That is, if u as an additive generator of U, then so is v(t) = cu(t),c > 0. 

T See, for example, Ha, where representable uninorms are called “associative com- 
pensatory operators”. 
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Note 4.13. In the case of n arguments, representable uninorms can be built from an 
additive generator u as follows: 


For inputs not containing simultaneously the values 0 and 1, that is, for tuples 
(a1,...,@n) belonging to [0,1]"\{(a1,...,an) : {0,1} C {a1,...,an}}: 


U(a1,...,0) =u (>: u(x) 


i=l 


Otherwise — that is, for tuples containing simultaneously the values 0 and 1: 
— U(a1,...,%n) =0 for a conjunctive uninorm; 
— U(z1,..., £n) =1 for a disjunctive uninorm. 


Proposition 4.14 (Properties of representable uninorms). 


1 


Representable uninorms are almost continuous (i.e., continuous on 
[0, 1]?\{(0, 1), (1,0)}). Moreover, they are the only uninorms verifying this 
property (a uninorm is almost continuous if and only if it is representable). 


. The function Nu : [0,1] — [0,1] given by Nu(t) = u-+(—u(t)) is a strong 


negation with fixed point e, and U is self-dual, excluding the points (0,1) 
and (1,0), with respect to Ny, that is: 


U(a,y) = Nu(U(Nu(x),Nu(y))) Yæ, y) € [0, 1]\{(0, 1), (1, 0)}- 


. Representable uninorms verify thefollowing equalities and inequalities (note 


that the latter may be understood as a generalization of the Archimedean 
property of continuous t-norms and t-conorms): 


a) Vt € [0,1[: U(t,0)=0; 

b) Vt <]0,1]) : U(t,1)=1; 

c) Vt €JO,1[ : U(t Nu(t)) =e; 
d) Vt €JO,e[ : U(t,t) <t; 

e) Vt cļe,1[ : U(t, t)>t. 


. Since their generators u are strictly increasing, representable uninorms 


are strictly increasing on ]0,1[°. 


. Strict monotonicity, along with almost continuity, ensure that the underly- 


ing t-norm and t-conorm of representable uninorms are necessarily strict 
(see section[3.4.3). Moreover: 

e IfU is a representable uninorm with additive generator u and neutral 
element e, then the functions g,h : [0,1] > [0, +00], given by g(t) = 
—u(e-t) and h(t) = u(e+(1—e)-t), are additive generators, respectively, 
of the underlying strict t-norm Ty and strict t-conorm Sy. 

e Conversely, given a value e €]0, 1|, a strict t-norm T with an additive 
generator g and a strict t-conorm S with an additive generator h, the 
mapping u : [0,1] — [—00, +00] defined by 
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is an additive generator of a representable uninorm with the neutral 
element e and underlying functions T and S. 


Note 4.15. With regards to the last statement, it is interesting to point out] that 
given a value e €]0, 1[, a strict t-norm T and a strict t-conorm S, the triplet (e, T, S) 
does not determine a unique (conjunctive or disjunctive) representable uninorm with 
neutral element e, but rather a family of them. This is due to the fact that additive 
generators of t-norms and t-conorms (see Proposition [B.43) are unique only up to 
a multiplicative positive constant, and then one can choose among the different 
additive generators of T and S. The choice of these generators does not affect the 
behavior of the corresponding uninorms on the squares [0,e]? and [e,1]? (T and 
S always remain the underlying t-norm and t-conorm), but it does influence the 
behavior obtained on the remaining parts of the domain. For example, let g and h 
be additive generators of a strict t-norm T and a strict t-conorm S respectively, and 
take e €]0, 1[. Then, for each k > 0, the function 


TEMAN if t € [0,¢], 


= n () , ifteje,1], 





provides a representable uninorm Ux. All the members of the family {Ux }x>0 have 
neutral element e, underlying t-norm T and underlying t-conorm S, but differ on 
the region [0,e] x [e, 1] U [e, 1] x [0, e]. 


4.2.4 Examples 


Example 4.16 (The weakest and t he strongest uninorms). Given e €]0, 1|, the 
weakest and the strongest uninorms with neutral element e are given below 
(the structure and a 3D plot of these uninorms can be viewed, respectively, 


in Figures [4-4] and [4.5): 


e The weakest uninorm with neutral element e is the conjunctive uninorm 
belonging to Umin built by means of the weakest t-norm (the drastic prod- 
uct Tp) and the weakest t-conorm, max: 


0, if (x,y) € (0, e[?, 
Umin(To,max,e) (Oy) = { max(z, y), if (x,y) € [e, 1], 
min(z,y) otherwise. 
e The strongest uninorm with the neutral element e is the disjunctive uni- 
norm belonging to Umax built by means of the strongest t-norm, min, and 
the strongest t-conorm, Sp: 


min(z,y), if (x,y) € [0, e]°, 


Umax(min,S'p,e) (a, y) = 1, if (x, y) Ele, 1 a 
max(z,y) otherwise. 


8 For details on this, see 141, [1g4}. 
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Fig. 4.4. Structure of the weakest (left) and the strongest (right) bivariate uninorms 
with neutral element e €]0, 1[ (example [4.16] p. 206). 





Fig. 4.5. 3D plots of the weakest (left) and the strongest (right) bivariate uninorms 
with neutral element e = 0.5 (example p. 206). 


Example 4.17 (Idempotent uninorms in Umin and Umax). Other commonly 
cited examples of uninorms are the ones obtained from the families Umin and 
Umax by choosing T = min and S$ = max (see Figures 4.6] and [4.7): 


max(z,y), if (x,y) € [e, 1]?, 


Umin(min,max,e) (£, y) = E otherwise. 


min(«,y), if (x,y) € [0, e]?, 


Umax(min,max,e)(,Yy) = eae y) otherwise. 


Note that the above examples are idempotent, and are, as a consequence, 
averaging functions. 
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Fig. 4.6. Structure of the idempotent uninorms (conjunctive on the left, disjunctive 
on the right) in the families Umin and Umax (example[Z17] p. 207). 





Fig. 4.7. 3D plot of the idempotent uninorms in the families Umin and Umax with 
the neutral element e = 0.5 (example p. 1207). 


Example 4.18. An important family of parameterized representable uninorms 


is given by fod, Ha, see also [za], 


Ole) = (ou) E PAO, 
where A €]0,+o0[| and either U)(0,1) = U)(1,0) = 0 (in which case U) is 
conjunctive) or U)(0,1) = U)(1,0) = 1 (and then UV) is disjunctive). 

U, has neutral element e) = hy and it can be obtained by means of the 
additive generator 
t) =lo (=) 
ualt) = log | 727 


The corresponding underlying t-norm and t-conorm are: 
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ALY 
T; — ee 
uy (ty) A+1—(a#+y- zy) 
and A-1) 
GE+yt(A—l)a« 
Su, (x,y) = - ` 


1 + Ary 


which belong to the Hamacher family (see Chapter B): indeed, a simple 
calculation shows that Ty, is the Hamacher t-norm TH, , whereas Suy, is the 
= 


Hamacher t-conorm S aoe 
Example 4.19 (The 3 — II function). 
Taking À = 1 in the above example provides a well-known representable 


uninorm, usually referred to as the 3— I function, which, in the general n-ary 
case is written as: 


U (fiori; in) = 


Tl n+ 0-a) 


with the convention 2 = 0 if one wants to obtain a conjunctive uninorm, and 


choosing 2 = 1 in order to obtain a disjunctive one. Its additive generator is 


lie ie (4) 


The 3 — JT function is a special case of Dombi’s aggregative operator (zg). 
Note that PROSPECTOR’s aggregation function in Example [4.5] is pre- 
cisely the 3 — JI function when using the [0, 1] scale. 


Example 4.20 (MYCIN’s aggregation function). MYCIN’s aggregation func- 
tion on [0,1], (cf. Example [4.4) is a conjunctive representable uninorm with 
an additive generator al 


_f log(2t), ift<4, 
u(t) = { —log(2 — 2t) otherwise. oe) 


It has neutral element e = 4, and the inverse of u is given as 


& ift< 
wo] ru Hes, (4.6) 


t . 
1— — otherwise. 


Thus evaluation of MYCIN’s function for any number of inputs can be done 


by using n 
U(x) =u (>: us) . 


with u and u~! given in (45) and (46). 
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Fig. 4.8. 3D plots of the 3 — I (left) and MYCIN’s (right) uninorms. The two 
intersecting straight lines illustrate the presence of the neutral element. 


Example 4.21. An interesting example arises in the case of u(t) = — log(log, (t)), 
where à € (0,1). wis a strictly increasing bijection, and u(A) = 0. The inverse 
is u-1(t) = A®PC t), Then 


U(x) = Ali=1 loga * — exp Q Ai" J [ios zi) 


i=1 


is a representable uninorm with the neutral element aN In the special case 
n = 2 we obtain an alternative compact expression 


U(x, y) = ead = y 8, 


and, of course, we can have either conjunctive (U (0,1) = U(1,0) = 0) or 
disjunctive (U(0, 1) = U(1,0) = 1) cases (compare to the mean in the Example 
2.26). The limiting cases of A = 1 and A = 0 correspond to drastic t-norm 
and t-conorm respectively. 

The corresponding underlying (strict) t-norm and t-conorm are given, 
respectively, by the additive generators g(t) = —u(At) = log(logy(At)) and 
h(t) = u(A+ (1—A)t) = — log(log, (A+ (1 — A)t)). Interestingly, the underlying 
t-norm is related to U by the equation 


Ty(z, y) = aye! = cyl (x,y). 


The t-norm Ty is a Gumbel-Barnett copula (see Example [3.119] part 4) for 


AE [d), wth parameter a= py 


° To avoid confusion with the Euler number e, we denote the neutral element by A 
in this example. 
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4.2.5 Calculation 


Calculation of numerical values of uninorms can be performed recursively, 
or, for representable uninorms, by using their additive generators. Similarly 
to the case of t-norms and t-conorms, care should be taken with respect 
to evaluation of functions with asymptotic behavior, as well as numerical 
underflow and overflow. 

However, evaluation of uninorms presents an additional challenge: these 
functions are discontinuous, at the very least at the points (0,1) and (1,0). 
In the surrounding regions, these outputs exhibit instability with respect to 
input inaccuracies. For example, for a conjunctive uninorm (like MYCIN’s or 
PROSPECTORS’s functions), the value f(0,1) = 0, but f(¢,1) = 1 for any 
e > 0, like 0.00001. 

Evaluation using additive generators also has its specifics. For t-norms and 
t-conorms, an additive generator could be multiplied by any positive constant 
without changing the value of the t-norm. This is also true for the generators 
of the representable uninorms. However, when an additive generator is defined 
piecewise, like in Example[4.20] one may be tempted to multiply just one part 
of the expression aiming at numerical stability. While this does not affect the 
values of the underlying t-norm or t-conorm in the regions (0, e]” and [e, 1]”, 
it certainly affects the values in the rest of the domain (see Note [4.15). 


4.2.6 Fitting to the data 


We examine the problem of choosing the most suitable uninorm based on 
empirical data, following the general approach outlined in Section [1.6] We 
have a set of empirical data, pairs (Xk, yx), k = 1,..., K, which we want to 
fit as best as possible by using a uninorm. Our goal is to determine the best 
function from that class that minimizes the norm of the differences between 
the predicted (f(x,)) and observed (yx) values. We will use the least squares 
or least absolute deviation criterion, as discussed on p. [B3] 

For uninorms, this problem has two aspects: fitting the actual uninorm 
and choosing the value of the neutral element. First we consider the first part, 
assuming e is fixed or given. Then we discuss fitting the value of e. 

If the uninorm is given algebraically, e.g., through parametric families of 
the underlying t-norms and t-conorms, then we have a generic nonlinear 
optimization problem of fitting the parameter(s) to the data. In doing so, 
one has to be aware that even if the underlying t-norm and t-conorm are 
specified, there is a degree of freedom for the values outside [0, e]” and [e, 1]”. 
Therefore a rule for how the values in this part of the domain are determined 
has to be specified. 

For representable uninorms, there is an alternative method based on the 
additive generators. It can be applied in parametric and non-parametric form. 
In parametric form, when the algebraic form of the additive generators of 
the underlying t-norm and t-conorm are specified, we aim at fitting three 
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parameters, Àr, Ag and a, the first two are the parameters identifying a 
specific member of the families of t-norms/t-—conorms, and a is the parameter 
which determines the values outside [0,e]” and [e, 1]”. 

Specifically, let the underlying t-norm and t-conorm have additive gener- 
ators gàr and has. The additive generator of the uninorm with the neutral 
element e has the form 


. f —g9ar(t/e), es 2, 
UAn Asra] = Bon otherwise. 
Note that for any a > 0 u generates a uninorm with exactly the same underly- 
ing t-norm and t-conorm, but the values outside [0, e]” and [e, 1]” depend on 
a. Therefore, the least squares or LAD criterion has to be minimized with re- 
spect to variables Àr, Ag and a > 0. This is a nonlinear optimization problem 
with possibly multiple locally optimal solutions. We recommend using global 
optimization methods discussed in the Appendix [A.5.4] and [A.5.5] 

The nonparametric approach is similar to the one used to fit additive 
generators of t-norms in Section [3.4.15] We represent the additive generator 
with a monotone increasing regression spline 


J 
S(t) = D> Bilt), (4.7) 


with the basis functions B; chosen in such a way that the monotonicity con- 
dition is expressed as a set of linear inequalities (see IE! [15}). Our goal is 
to determine from the data the unknown coefficients c; > 0 (for monotone 
increasing S'), subject to conditions 


Jd J 
S(e) = di Bile) =0, S(a)= > Bla) = 1, 


Since the underlying t-norm and t-conorms are necessarily strict, we use the 


well-founded generators (136, [137], defined as 





—(7+ S(e1)- 2), ift < £1, 
ult) = 4 H +S- e2) - 4, ifl1-t< ez, (4.8) 
S(t) otherwise. 
The values of £1, €2 are chosen in such a way that £1 is smaller than the smallest 
strictly positive number out of xix, Yk,i = 1,..., Nk, k = 1...,K, and e2 is 
smaller than the smallest strictly positive number out of 1 — zik, L — Yk, i = 
1,...,Ne,k =1...,K. The value a can be chosen as a = min{é1, 1 — €9}. 


Fitting the coefficients is performed by solving a quadratic programming 
problem (in the case of LS criterion) 
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2 
K J 
Minimize >> | >> cj B;(xx, yr) 
k=1 \j=1 
J 
s.t. > cj B;(e) = 0, (4.9) 
j=l 
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cj > 0. 


or, in the case of LAD criterion, 


K J 
Minimize |X cB; (Xx, yx) 





k=1 |j=1 
J 
s.t. 5 c;B;(e) = 0, (4.10) 
j=1 
J 
2 cjB;(a) =1, 
q=l 
Cj > 0. 


The functions B; are defined as 
By (Xk, yk) = Bj(xık) + Bj (tox) +... + Bj(Engk) — By (yr). (4.11) 


Standard QP and LP methods are then applied to the mentioned problems. 

Next, consider the case when the value of the neutral element e is not 
given, but has to be computed from the data. In this case one has to perform 
numerical optimization with respect to e and spline coefficients c; (or parame- 
ters Ar, Ag and a > 0). This is a complicated constrained global optimization 
problem. However it can be set as a bi-level optimization problem 


2 
K J 


min min X` cjBj(Xk, yk) | s-t. linear conditions on c; | . (4.12) 
ec[0,1] | cj mare ee 


The inner problem is exactly the same as (£9) or (£10), with a fixed 
e, and the outer problem is a univariate global optimization problem. We 
recommend using the Pijavki-Shubert deterministic method (see Appendix 
[A.5.5), which requires only the values of the objective function (expression in 
the brackets). The latter is given as a solution to a QP or LP, which is not 
expensive numerically. In any case, for reasonable accuracy, the outer problem 
is not expected to require too many objective function evaluations. 

As an example, Fig. shows a graph of the additive generator found 
by solving problem with the data generated using the 3 — M uninorm 
(Example[4.19), whose additive generator is also presented for comparison. 60 
random data were generated. Note the accuracy with which the value of the 
neutral element e has been computed. 
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Fig. 4.9. Fitting an additive generator of the 3— I uninorm using empirical data. 
The smooth curve is the true additive generator and the piecewise linear curve is its 
spline approximation. The computed value of e was 0.498. 


4.3 Nullnorms 


Uninorms (Section are associative and symmetric aggregation functions 
that act as t-norms when receiving low inputs and as t-conorms when dealing 
with high values. Nullnorm¢!)] are associative and symmetric aggregation func- 
tions with the opposite behavior, that is, acting as t-conorms for low values 
and as t-norms for high values. Similarly to uninorms, nullnorms are averag- 
ing functions when dealing with mixed inputs (those including both low and 
high values), but their behavior in such cases is much more restrictive, since 
it is limited to a unique value (which coincides with the absorbing element). 


4.3.1 Definition 


In a similar way to uninorms, nullnorms are defined as a generalization of 
t-norms and t-conorms by just modifying the axiom concerning the neutral 
element. The definition for the bivariate case is the following: 


Definition 4.22 (Nullnorm). A nullnorm is a bivariate aggregation func- 
tion V : [0,1]? — [0,1] which is associative, symmetric, such that there exists 
an element a belonging to the open interval ]0, 1[ verifying 
Vt € (0, al, V(t,0) =¢, (4.13) 
vte [al],  V(t,1)=t. (4.14) 


10 Nullnorms were introduced under this name in 2001 [49], but are equivalent to 
the so-called t-operators Ezi. 
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Note 4.23. The original definition included an additional condition that the element 
a is the absorbing element of V. This condition is redundant since it follows directly 
from monotonicity and the conditions (4.13) and Ta. 


Note 4.24. Nullnorms can also be defined allowing the element a to range over the 
whole interval [0,1], in which case t-norms and t-conorms become special limiting 
cases of nullnorms: 


e Taking a = 1, condition (4.13) states that 0 is the neutral element of V, and 
then, by definition, V would be a t-conorm. 

e Analogously, choosing a = 0, condition (£14) states that 1 is the neutral element 
of V, and this would entail that V is a t-norm. 


Note 4.25. Since nullnorms are associative, they are extended in a unique way to 
functions with any number of arguments. Thus nullnorms constitute a class of ex- 
tended aggregation functions as defined in Definition [6] on p. Ø 


4.3.2 Main properties 


Figure [4.1]shows that a nullnorm with absorbing element a is disjunctive in 
the square [0,a]? and conjunctive in [a,1]?. In fact, any nullnorm V with 
absorbing element a is related to a t-conorm Sy and a t-norm Ty, known as 
the underlying functions of V, such that: 


V(e,y) €[0,a]?, Væ) =a: Sv (2,8), 


aa 





l-a l-a 


V(a,y) € [a, 1]?, V(a,y) =a+(1—a)-Ty (e), 


On the remaining parts of the unit square, nullnorms return as the output 
the absorbing element, i.e: 


V(x, y) € [0,a] x [a, 1] U [a, 1] x [0, a], V(a,y) =a 


In particular, it is V(0,1) = V(1,0) = a. The structure of nullnorms is de- 
picted in Figure 4.10] using the following notation: 


Sý (z, y) =a: Sv (=. “). 


aa 


: L-ay-a 
T; = 1— T —— }. 
pes) =at (1a) Ty (Z) 
Therefore, similarly to uninorms, each nullnorm univocally defines a t- 
norm and a t-conorm. The converse (which is false in the case of uninorms), is 
true for nullnorms, that is, given an arbitrary t-norm T’, an arbitrary t-conorm 


4 Indeed, for any t € [0,1], it is a = V(a,0) < V(a,t) < V(a,1) = a, and this 
implies V(a,t) =a. 
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. conjunctive 
averaging ied 
scale 


const=a k 
t-norm T, V 


disjunctive . 
J averaging 


scaled = 
(enon Sy; ene 








Fig. 4.10. Structure of a nullnorm with absorbing element a, underlying t-norm 
Ty and underlying t-conorm Sy. 


S and an element a €]0,1[, there is a unique nullnorm V with absorbing 
element a such that Ty = T and Sy = S. Such a nullnorm is given by 





a8 (2E) if (x,y) € [0, aļ?, 
Vrsa(e,y) = 4 a+ (1-a): T (358,42), if (@,y) € [a,1}, 
a otherwise. 


Thanks to associativity, the above result can be readily extended to the 
general n-ary case as follows: 








a- S(2,..., =Œ), if (1,..., an) € [0,a]”, 
Vr,8,a(@1,---;2n) = a+(1—a)- (S52... 35), if (a1,...,%n) € [a, 1)”, 
a otherwise. 


Observe that any couple (t-norm, t-conorm) may be used to construct a 
nullnorm, independently of the properties that the t-norm and the t-conorm 
exhibit: for example, it is possible to choose two nilpotent functions, or two 
strict ones, or a mixed couple made of one nilpotent and one strict aggregation 
function, and so on. Of course, since both t-norms and t-conorms can be 
parameterized, the same happens to nullnorms. 

The main properties of nullnorms are easily deduced from their structure: 


Absorbing element As it was already mentioned in Note [4.23] each null- 
norm has the absorbing element a. 

Duality The class of nullnorms is closed under duality, that is, the dual 
function of a nullnorm V with respect to an arbitrary strong negation N, 
defined as 
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Va(“1, +. 50h) = N(V(N(z1), es -, N(£n))), 


is also a nullnorm, and if V has absorbing element a, Va has absorbing 
element N (aE. Contrary to the case of uninorms, it is possible to find self- 
dual nullnorms, but a clear necessary condition for this is that a verifies 
a=N (aE. More specifically (171), a nullnorm Vr.s,q is self-dual w.r.t. a 
strong negation N if and only if the following two conditions hold: 
1. N(a) =a; 
iernat € [0,1]? roat =N (TIN), a 
where Ñ : [0,1] — [0,1] is the strict negation defined by Ñ (t) = 
N(t-a)—a ( 
l-a 
strict negation Ñ). 

Continuity Nullnorms are continuous if and only if the underlying t-norm 
and t-conorm are continuous. 

Idempotency The only idempotent nullnorms are those related to the 
unique idempotent t-norm, min, and the unique idempotent t-conorm, 
max, that is, given a value a €]0,1[, there is only one idempotent null- 
norm with absorbing element a: 


i.e., the underlying t-norm and t-conorm are dual w.r.t. the 


max(z1,---,2n), if (£1,---, £n) € [0,a]", 
Vain moxal tigrie En) = 4 min(#,...,%,), if (a1,...,¢n) € [a, 1)”, 
a otherwise. 


Idempotent nullnorms are nothing but the extended averaging functions 
known as a-medians, already discussed in Section 2.8.1] Note that these 
functions are self-dual with respect to any strong negation N with fixed 
point a. A plot of one such function is given in Fig. [£13] 

Strict monotonicity Nullnorms may only be strictly monotone in the open 
squares ]0,a[” and Ja, 1[”", and this will happen only when the underlying 
t-norm and t-conorm are strictly monotone. 


4.3.3 Examples 


In the following we describe some prototypical examples of nullnorms: 


Example 4.26 (Weakest nullnorm). Given a €]0,1[, the weakest nullnorm 
with absorbing element a is the one constructed by means of the weakest 
t-norm, Tp, and the weakest t-conorm, max. It has the following structure: 


max(21,...,2n), if (z1,..., £n) € [0,a]”, 
Vip smaxja(£1;+6+5%n) =< Ti; if z; > a and zj = 1 for all j Fi, 
a otherwise. 


12 For details on the structure of Vz see, e.g., [171]. 
13 Tn other words, a must coincide with the fixed point of the negation N. 
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Example 4.27 (Strongest nullnorm). Given a €]0, 1|, the strongest nullnorm 
with absorbing element a is the one constructed by means of the strongest 
t-norm, min, and the strongest t-conorm, Sp: 


min(z1,...,2n), if (a1,..-,@n) € [a,1]”, 
Vam goal Biron) =< E; if x; < a and z; = 0 for all j Fi, 
a otherwise. 


The structure of these two limiting nullnorms can be viewed, for the bi- 
variate case, in Figure M.I] See also Figure 12] for a visualization of the 
corresponding 3D plots when choosing a = 0.5 as absorbing element. 
































y y 
const=a const=a const=a min 
a a 
max const=a@ const=@ const=a 
0 a x 0 a X 


Fig. 4.11. Structure of the weakest (left) and the strongest (right) bivariate null- 
norms (examples and [4.27) with absorbing element a. 


Example 4.28 (Lukasiewicz nullnorm). If the underlying t-norm and t-conorm 
of a nullnorm are taken, respectively, as the Lukasiewicz t-norm and t-conorm 
Tz and Sz (see Chapter B), the following function is obtained in the bivariate 
case (see Figure [4.13): 








c+y, ife+y<a, 
Vri ,SL,a(£, yY) = +y- ife+y>1+a, 
a otherwise. 


4.3.4 Calculation 


Calculation of the values of nullnorms is performed based on the scaled ver- 
sions of the underlying t-norm and t-conorm, with the constant value a as- 
signed outside the region [0, a]” U fa, 1]”. Calculation can be performed using 
recursive formulas, or by using additive generators. We refer to Section[B.4.13] 
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Fig. 4.12. 3D plots of the weakest and the strongest bivariate nullnorms (examples 
[4.26] and [4.27) with absorbing element a = 0.5 





Fig. 4.13. 3D plots of the bivariate idempotent nullnorm Vmin,max,a With absorbing 
element a = 0.5 (left) and Lukasiewicz nullnorm (example [4.28) with absorbing 
element a = 0.6 (right). 


4.3.5 Fitting to the data 


We follow the same approach taken in Section [4.2.6] p. 211] to fit uninorms 
to empirical data. We can use either parametric or non-parametric meth- 
ods. However, an important distinction from uninorms is that nullnorms are 
defined uniquely by the underlying t-norm and t-conorm and the value of 
the annihilator a. There is no freedom of choosing values outside the region 
(0, a]” U [a, 1”. 

In other words, given a fixed value of a, the underlying t-norm and t- 
conorms are completely independent, and can be fitted independently using 
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the data which falls into [0,a]” and [a,1]". The underlying t-norm and t- 
conorms can be either strict or nilpotent, or ordinal sums. 

Since any continuous t-norm and t-conorm can be approximated arbitrar- 
ily well by an Archimedean one, and continuous Archimedean t-norms and 
t-conorms are defined by their additive generators, then it makes sense to fit 
additive generators in the same way as was done for t-norms in Section[B.4_15] 
To do this, the data is split into three groups: the data on [0,a]", the data 
on [a, 1]” and the data elsewhere (group 3). The first two groups are used to 
fit (scaled) additive generators of the underlying t-norm and t-conorm. The 
third group is discarded. 

Next consider fitting the unknown value of the annihilator a to the data. 
Here we vary a over [0, 1] to minimize the least squares or LAD criterion. Now 
all data (including group 3) are utilized in calculating the differences between 
the predicted and observed values. This can be set as a bi-level optimization 
problem, where at the outer level we minimize 


min JO (fa(Xt) — ye)”, 


a€ [0,1] ia ae 


where fa(x) is the nullnorm computed from the underlying scaled t-norm and 
t-conorm and a fixed parameter a, i.e., fa is the solution to problem of type 


(3.18) or (8.19) with an appropriate scaling. 


4.4 Generated functions 


We have studied in Chapters 2}4] several types of aggregation procedures 
which follow essentially the same pattern: 


e transform each input value using (possibly different) univariate functions; 
e add the transformed values; 
e return as the output some transformation of that sum. 


The term transformation is understood as application of a univariate func- 
tion, fulfilling some basic properties. Such aggregation functions are known as 
generated functions, or generated operators. The role of generators is played 
by the univariate functions used to transform the input and output values. 

Several generated functions have already been studied in this book, namely 
weighted quasi-arithmetic means, Archimedean triangular norms and conorms 
and representable uninorms (see Section [4.4.1] below). This means that the 
class of generated functions comprises extended aggregation functions with 
all kinds of behavior, that is, there are averaging, conjunctive, disjunctive and 
mixed generated functions. In this section we summarize the most general 
properties of generated functions, and study three specific classes of mixed 
aggregation functions, not mentioned earlier. 
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4.4.1 Definitions 


Definition 4.29 (Generated function). Let g1,..., gn : [0,1] — [—co, +00] 
be a family of continuous non-decreasing functions and let h : >", Ran(gi) > 
(0, 1] be a continuous non-decreasing surjection4 The function f : {0,1]" > 
(0, 1] given by 


f(@1,-.-5@n) = h(gi(@1) +... + gn(@n)) 
is called a generated function, and ({gi}ic{1,...n} h) is called a generating 


system. 


Note 4.30. Observe that if there exist j, k € {1,...,n}, j Æ k, such that g;(0) = —co 
and gx(1) = +00, then the generated function is not defined at the points such that 
a; = 0 and x, = 1, since the summation —o0 +00 or +00—oo appears in these cases. 





When this occurs, a convention, such as —co + co = +00 — œ = —on, is adopted, 
and — see Section below — continuity of f is lost. 


The monotonicity of the functions that form the generating system, along 
with the fact that h is surjective, provides the following result: 


Proposition 4.31. Every generated function is an aggregation function in 
the sense of Definition [Z.3] 


Several important families of generated functions have already been ana- 
lyzed in this book. In particular, recall the following families: 


e Continuous Archimedean t-normd} (Proposition 3.37) 


T(x) = g7” (g(a1) +... + 9(@n)), 


where g : [0,1] — [0,20], g(1) = 0, is a continuous strictly decreasing 
function and g‘—" is its pseudo-inverse, are generated functions with gen- 
erating system given by g;(t) = —g(t) and h(t) = =g} (t). 

e Continuous Archimedean t-conorms (Proposition [3.45) 


S(x) = g~ (g(a1) + --. + g(an)), 


where g : [0,1] — [0,co], g(0) = 0, is a continuous strictly increasing 
function and g‘— is its pseudo-inverse, are generated functions with gen- 
erating system given by g;(t) = g(t) and h(t) = g5» (t). 


That is, Ran(h) = (0, 1]. 
15 Tn particular, Archimedean copulas. 
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e Weighted Archimedean t-norms (Definition B.91) 
Tw(x) = g—?(wig(21) +... + wng(En)), 


where g : [0,1] — [0,co], g(1) = 0, is a continuous strictly decreasing 
function, g‘—)) is its pseudo-inverse, and w is a weighting vector, are gen- 
erated functions with generating system given by g;(t) = —wig(t) and 
h(t) = -9 (t), 

e Weighted Archimedean t-conorms (Definition 3.98) 


Sw(x) = 9g? (wig(a1) +... + wng(@n)), 


where g : [0,1] — [0,00], g(0) = 0, is a continuous strictly increasing 
function, g} is its pseudo-inverse, and w is a weighting vector, are 
generated functions with generating system given by g;(t) = wig(t) and 
h(t) = gD). 

e Weighted quasi-arithmetic means (Definition B.I8) 


Mw,g(x) = g* (wig(21) +... +Wng(Tn)), 


where g : [0,1] — [—co, +00] is a continuous strictly monotone function 
and w is a weighting vector, are generated functions with generating sys- 
tem given by g;(t) = wig(t) and h(t) = g~*(t) (or gi(t) = —wig(t) and 
h(t) = —g7} (t) if g is decreasing). 

e Representable uninorms (Definition [4.11) 


U(x) =u" (u(zi)+...+u(an)), {0,1} É {a1,..., an}, 


where u : [0,1] — [—00, +00] is a strictly increasing bijection, are generated 
functions with generating system given by g;(t) = u(t) and h(t) = u~1(t). 


Note 4.82. Observe that different generating systems may define the same generated 
function. A simple example of this situation is given by the generating systems 
(gi(t), h(t)) and (cgi(t), h(+)), c € R, c > 0, that generate the same function. Another 
example is provided by weighted quasi-arithmetic means, which, as mentioned in 
Chapter 2] are equivalent up to affine transformations of their generating functions 
g, and this translates into different generating systems. 


Note 4.83. Weighted quasi-arithmetic means are special cases of generated func- 
tions. They include weighted arithmetic means, obtained when choosing g(t) = t, 
with associated generating systems of the form g;(t) = wit and h(t) = t. Weighted 
arithmetic means form a special class of generated functions, since a generated func- 
tion is a weighted arithmetic mean if and only if it can be generated by an affine 
generating system (i.e., a system such that g(t) = ait + b; and h(t) = at + b, 
with a, ai, b,b; € R, a,a; > 0). 


Note that Definition 4.29] can be easily upgraded to families of generated 
aggregation functions as follows: 
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Definition 4.34 (Extended generated aggregation function). An ez- 
tended function 
F: (J [0,1"- (0,1), 
n€{1,2,...} 


verifying F(t) = t for anyt € [0,1], and such that all its restrictions to [0,1]", 
n > 1, are generated functions, is called an extended generated aggregation 
function. 


4.4.2 Main properties 


Continuity Generated functions are always continuous on their whole do- 
main except for some very specific situations involving the aggregation of 
contradictory information (tuples containing a 0 and a 1). More precisely: 
e Generated functions such that Dom(h) 4 [—o0, +00] are always con- 

tinuous on the whole unit cube. 

e Generated functions such that Dom(h) = [—co, +00] are continuous 
except for the case when there exist j,k € {1,..., n}, j Æ k, such that 
gj;(0) = —oo and gx(1) = +00, in which case the function generated 
by the system ({9: }ie41,....n},) is discontinuous at the points x such 
that x; = 0 and a, = 1. An example of this situation is given by 
representable uninorms (Section [4.2). Note however (see Example [4.45] 
below) that the condition Dom(h) = [—oo,+00] by itself does not 
necessarily entail lack of continuity. 

Symmetry Generated functions may clearly be either symmetric or asym- 
metric. Moreover, a generated function is symmetric if and only if it can 
be generated by a generating system such that for all i € {1,...,n} it is 
gi = g, where g : [0,1] — [—oo + oo] is any continuous non-decreasing 
function s51). 

Idempotency There are many different classes of non-idempotent gener- 
ated functions (e.g., continuous Archimedean t-norms/t-conorms or rep- 
resentable uninorms). There are also different idempotent generated func- 
tions, such as weighted quasi-arithmetic means. In fact, a generating sys- 
tem ({gi}ic{1,... n} h) provides an idempotent generated function if and 
only if h~1(t) = X`; g:(t) for any t € [0,1] (151). 

Note 4.385. This result, along with the result about the symmetry, entails that 

quasi-arithmetic means are the only idempotent symmetric generated functions. 


Duality The class of generated functions is closed under duality with respect 
to any strong negation. Indeed, given a strong negation N, the dual of a 
generated function with generating system ({gi}ie41,....n},/) is a function 
generated by the system ({(gi)a}ie{1,....n}» ha) where 
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(9:)a(t) = —gi(N(t)) 
ha(t) = N(h(—t)) 


Of course, this allows for N-self-dual generated functions. Some of them 
have already been mentioned in this book, such as weighted quasi- 
arithmetic means built from bounded generating functions (see Section 
[2.3) or representable uninorms up to the tuples containing the values 0 
and 1 (Proposition [4.14). 

Neutral element Some generated functions, such as the ones belonging to 
the classes of t-norms, t-conorms and uninorms, possess a neutral element, 
whereas others, such as the arithmetic mean, do not. The following full 
characterization of the class of extended generated functions possessing a 
(strong) neutral element is available 


Proposition 4.36. An extended generated aggregation function F has a 
neutral element e € [0,1] if and only if for each n > 1 the restriction to 
(0,1]" of F, fn, can be expressed as 


fanla... £n) = g-Y(g(a1) +... + g(an)) (4.15) 


where g : [0,1] — [—co + œ] is a continuous non-decreasing function such 


that g(e) = 0 with the pseudo-inverse (see[3.4.6) go». 


The n-ary generated functions of the above characterization can be clas- 
sified as follows: 

1. If e = 0, then fn is a continuous Archimedean t-conorm. 

2. If e= 1, then fn is a continuous Archimedean t-norm. 

3. If e €]0,1f[: 

a) If Ran(g) = |—-2o0 +00], fn is a representable uninorm, and, hence, 
associative but non-continuous at the points that simultaneously 
contain 0 and 1. 

b) Otherwise, fn is continuous and non-associative (see next section). 


Note 4.397. The characterization given in Proposition implies that gener- 
ated functions with a neutral element are necessarily symmetric. 


Associativity Instances of associative as well as non-associative generated 
functions have been given in Section 4.4.1] Among the associative func- 
tions, one finds either continuous (e.g., t-norms and t-conorms) or non- 
continuous (e.g., uninorms) functions. Regarding continuous associative 
generated functions, it is known that these can only be projection func- 
tions, Archimedean t-norms, Archimedean t-conorms or nullnorms [46]. 


4.4.3 Classes of generated functions 


The following classes of generated functions have been already studied in 
detail: 
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Weighted quasi-arithmetic means (Section 2.3); 

Continuous Archimedean t-norms and t-conorms (Section .4.4); 
Weighted Archimedean t-norms and t-conorms (Section [3.4.16); 
Representable uninorms (Section [4.2.3). 


In this section we examine three other classes of mixed generated aggregation 
functions. The first two classes — continuous generated functions with a neutral 
element and weighted uninorms ~ are related to t-norms and t-conorms. 


Continuous generated functions with a neutral element 


Recall from Section[4.2] Proposition[4.14] that representable uninorms — class 
3.a) above — behave as strict t-norms on [0,e]" and as strict t-conorms on 
[e,1]”. Unfortunately, uninorms are discontinuous, which implies that small 
changes to the inputs (in the neighborhood of discontinuity) lead to large 
changes in the output values. Discontinuity is the price for associativity 46, 
99], and several authors abandoned associativity requirement (e.g., Yager’s 
generalized uninorm operator GenUNI (270). 

We note that associativity is not required to define aggregation functions 
for any number of arguments, i.e., extended aggregation functions. Generated 
aggregation functions are an example of an alternative way. We shall now 
explore a construction similar to that of uninorms, which delivers continuous 
mixed aggregation functions, case 3.b) above. 

Thus we consider aggregation functions defined by 


f(x) = 9 (g(a1) +... + 9(@n)), (4.16) 


with g : [0,1] — [—co + oo] a continuous non-decreasing function such that 
g(e) = 0, gT? its pseudo-inverse, and Ran(g) © [—00 + ov] isd [184]. 
According to Proposition [£36] f has neutral element e. Moreover, it is 
continuous on [0, 1)", it acts on [0,e]” as a continuous scaled t-norm T built 
from the additive generator gr(t) = —g(et), and it acts on [e, 1]” as a continu- 


ous scaled t-conorm S built from the additive generator gg(t) = g(e+(1—e)t). 


Note 4.88. Either T, or S, or both are necessarily nilpotent (if both are strict, 
Ran(g) = [—co + œo] and we obtain a representable uninorm). 


Conversely, given a value e €]0, 1[, a continuous t-norm T with an additive 
generator gr and a continuous t-conorm S with an additive generator gg, the 
mapping g : [0,1] — [—co, +00] given by 


—9T (4) ; ift E€ [0, e], 
s(t) = P (=) , iftele, 1], 





defines a generated aggregation function by (4.16), with the neutral element 
e. If either T or S or both are nilpotent, then f is continuous. 
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Note 4.89. Since the additive generators of T and S are defined up to an arbitrary 
positive multiplier, it is possible to use different gr and gs which produce the same 
t-norm and t-conorm on [0,e]” and [e,1]”, but different values on the rest of the 
domain. Thus we can use 


—agr (+), if t € [0,e], 
W= bos (#8), ree, 





with arbitrary a,b > 0. 


Examples [4.41J4.43]on pp. 2274229] illustrate continuous generated func- 
tions with a neutral element. 


Weighted generated uninorms 


Recall the method of introducing weights in t-norms and t-conorms presented 
in Section [3.4.16| which is based on using 


Tw(x) = g0” (>: rato) j 


with a weighting vector w, not necessarily normalized! [a7 lag, 261|, Bz. 
Yager and Calvo and Mesiar extended this approach to weighted 
uninorms, defined as 


Definition 4.40 (Weighted generated uninorm). Letu : [0,1] — [—00, o0] 
be an additive generator of some representable uninorm U, and w : w; > 0 
be a weighting vector (not necessarily normalized). The weighted generated 
uninorm is defined as 


Uw(x) = u! (>: wul) l (4.17) 


Recall from Section [B.4.16] that introduction of weights can be expressed 
through an importance transformation function H defined by 


H(w,t) =u (wu(t)), 
so that 
Uy = U(A(wi,21),...,H (Wn, 2n)), 


provided that maxw; = 1. The function H : [0,1]? — [0,1] in the case of 
uninorms should satisfy the following properties: 


e A(i,t)=u-(u(t)) = t; 
e H(0,t) =u'-(0) =e; 


16 Le., a weighting vector such that the condition > w; = 1 is not mandatory. 
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H(w,e) = u=? (0) = e; 
H(w,t) is non-decreasing in t and is non-increasing in w when t < e and 
non-decreasing if t > e. 


Example [4.44] on p. 229] illustrates a weighted generated uninorm. A 
weighted uninorm is not associative nor symmetric, and it does not have 
a neutral element (except in the limiting case w; = 1, i=1,...,n). 

The approach to introducing weights using additive generators is easily 
extended to the class of continuous generated functions with a neutral element 
e €]0,1[ given by (4.16). It is sufficient to take 


falx)=g” (>: rato) (4.18) 


which differs from (£17) because it uses pseudoinverse of g. Since Ran(g) Æ 
[—oo, oo], this aggregation function is continuous. 


Asymmetric generated functions 


Finally, consider the case when the generating functions g; in Definition [£29] 
have different analytical form. Of course, the resulting generated function will 
be asymmetric. This construction gives rise to a large number of parametric 
families of aggregation functions of different types. An example of such a 
function is presented in Example[Z.45]on p. 2230] 


4.4.4 Examples 


Continuous generated functions with a neutral element 


Example 4.41. [43} Proposition |4.36] allows one to easily build gen- 
erated functions with a given neutral element. For example, let e € [0,1] 
and consider the function g : [0,1] — [—oo,+o0], defined as g(t) = t — e, 
which is continuous, non-decreasing and such that g(e) = 0. It is Ran(g) = 


(9(0), g(1)] = [-e, 1 — e], and its pseudo-inverse g(-)) is given by 
ite ift>l-e, 
gVQ=<tt+e, if -e<t<1-e, 
0, if t < —e. 


Then the function f : [0,1]” — [0, 1] defined by (416) is given by 


faš 


, if X) (z:-e)+e>1, 
i=1 


Mea 
Me 


(a; -e) +e <1, 


f(1,...,2n) = (a; —e) +e, if0-< | 


1 
if So (a —e)+e <0, 


i=l 


Il 
m 


O + 
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which alternatively can be expressed as 


f is a generated aggregation function with neutral element e. If e = 0 it is 
nothing but the Lukasiewicz t-conorm, whereas e = 1 provides the Lukasiewicz 


Otherwise, since Ran(g) 4 [—oo + oo], f is continuous but is not 
associative. Further, on [0,e]” it acts as the continuous t-norm with additive 


generator gr(t) 


t-norm. 


—g(te) = (1 — t)e and on [e, 1]” it acts as the continuous 


gle + (1—e)t) = (1 — e)t, i.e., it is 


an ordinal sum of the Lukasiewicz t-norm and the Lukasiewicz t-conorm [14 


t-conorm with additive generator gs(t) 


= 0.5 is presented 


A 3D plot of this function in the bivariate case with e 


on Figure |4.14 


(£16) using the generating func- 


Example 4.42. Let us define a function f in 


tion 


2 


if0<t<e 
ife<t<l 


t—e, 
2(t — e), 


Í 


-norm and t- 


[£4.41] on the rest of the 


respectively, as the Lukasiewicz t 


|" f acts, 


[e,1 
conorm, but it differs from the function in Example 


and 
domain, see Note [4.39 


On [0, e]” 





vs 








Fig. 4.14. 3D plots of the generated aggregation functions in Examples [£.4]] (left) 


and [4.42] (right) with e = 0.5. 
17 Here we use the term “ordinal sum” informally, in the sense that f acts as 


Lukasiewicz t-norm and t-conorm on [0,e]” and [e,1]” only. In a more formal 


sense, ordinal sum implies that f is either minimum or maximum on the rest of 


the domain, see Section [3.4.9 
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4.4 Generated functions 


Example 4.48. Consider a mixed aggregation function which acts as product 


on [0,e]” and as Lukasiewicz t-conorm on [e, 1]". One such function is given 


by 16) with 


2 


if0<t<e 


log(t/e), 
t—e, 


ife<t<l. 


att) = { 


Its plot is presented on Figure [4.15] Note that it has the neutral element e 
and the absorbing element a = 0, compare with Note[I.32]on p. [13] 


T2 








Fig. 4.15. 3D plots of the generated aggregation functions in Example [4.43] with 


e = 0.5 (left) and Example [4.45] (right). 


Weighted generated uninorms 


2| Take the generating function 


27 


Example 4.44. 


J: 


lo 
(4 
which was used in Example[Z.18]on p.208]to generate representable uninorms 


At 


(t) = 


Ux 


> 


A 


ATY 
Axy + (1 -— z)(1— y) 


(x,y) 


Uy 


The weighted version of this uninorm is given by 


0 or = 1 is needed. 


Of course, an appropriate convention € 
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Asymmetric generated functions 


Example 4.45 ({15q)). Let n = 2, gi(t) = log(~4), 92(t) = log(1 + t) and 
h(t) =g (t) = TA Then the generated function f(x,y) = h(gi(x) + g2(y)) 
is given by 











£+ Ty 
fay) =F . 
+ ry 
This function is continuous (although Dom(h) = [—o0, +00]), asymmetric and 


not associative. A 3D plot of this function is presented on Figure [4.15] 


4.4.5 Fitting to the data 


Fitting continuous generated functions to empirical data can be done using 
the approach discussed in Section [4.2.6] with the following difference: the 
asymptotic behavior expressed in (48) is no longer needed, and it can be 
replaced by a condition expressed in (3.18) in the case of nilpotent t-norms, 
namely S(a) = —1. 

Fitting parameters of weighted uninorms involves two issues: fitting the 
weighting vector and fitting an additive generator. When the weighting vector 
is fixed, the problems of fitting an additive generator is very similar to the 
one discussed in [4.2.6] except that the functions B; in (4.11) are multiplied by 
the corresponding weights. When the generator is fixed, we have the problem 
analogous to fitting the weights of weighted quasi-arithmetic means, discussed 
in Section 2.3.7] equation (2.12), except that condition X` w; = 1 is no longer 
required. However when both the weighting vector and a generator need to be 
found, the problem becomes that of global optimization, expressed in (2.19). 


4.5 T-S functions 


Uninorms and nullnorms (sections 4.2] and (4.3) are built from t-norms and 
t-conorms in a way similar to ordinal sums, that is, they act as t-norms or as 
t-conorms in some specific parts of the domain. A rather different approach, 
still remaining close to t-norms and t-conorms, is the one adopted in the family 
of functions known as compensatory T-S functions (T-S functions for short), 
whose aim is to combine a t-norm and a t-conorm in order to compensate their 
opposite effects. Contrary to uninorms and nullnorms, T-S functions exhibit 
a uniform behavior, in the sense that their behavior does not depend on the 
part of the domain under consideration. They work by separately applying a 
t-norm and a t-conorm to the given input and then averaging the two values 
obtained in this way by means of some weighted quasi-arithmetic mean. 

The first classes of T-S functions were introduced in with the aim of 
modeling human decision making processes, and have been studied in more 
detail and generalized to wider classes of functions 82A [84, [2og, Bah. They 
were first applied in fuzzy linear programming 158| and fuzzy car control 


kad. 
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4.5.1 Definitions 
The first introduced T-S functions were known as gamma operators: 


Definition 4.46 (Gamma-operator). Given y €]0, 1|, the gamma aggrega- 
tion operator with parameter y is defined as E] 


Saltis satn) = (I=) (t-a -=) 


i=1 
Gamma operators perform an exponential combination of two particular 
functions — the product t-norm Tp and its dual t-conorm Sp, so actually they 


are special instances of a wider class of functions known as exponential convex 
T-S functions: 


Definition 4.47 (Exponential convex T-S function). Given y €]0,1[, a 
t-norm T and a t-conorm S, the corresponding exponential convex T-S func- 
tion is defined as 


Ey,7,g(@1,--+,2n) =(T(#1,--.,2n))"? + (S(a1,...,2n))” 


Note that the t-norm and the t-conorm involved in the construction of an 
exponential convex T-S function can be dual to each other, as is the case with 
gamma operators, but this is not necessary. 

Another approach to compensation is to perform a linear convex combi- 
nation of a t-norm and a t-conorm: 


Definition 4.48 (Linear convex T-S function). Given y €]0,1[, a t-norm 
T and a t-conorm S, the corresponding linear convex T-S function is defined 
as 

Ly r,s(21, sat En) = (1 E y) ` T (21, avers iEn) rye S(x1, esis sin): 


It is clear that both exponential and linear convex T-S functions are ob- 
tained from the composition (see Section [L5) of a t-norm and a t-conorm with 
a bivariate aggregation function M} : [0,1]? — [0,1], i.e., they are defined as 


MoE (hry 04 65 8n) SO (L1y+s 5 Bn), 


where in the case of exponential functions, M,(x,y) is given by x!~7y7, 
whereas for linear functions it is M,(z,y) = (1 — y)x + yy. Moreover, the 
two mentioned outer functions M, are particular instances of the family of 
weighted quasi-arithmetic means. In order to see this, let us first recall the 
definition, in the bivariate case, of these functions (see Definition 2-18] in p. 
[48] for the general case): 


18 This function should not be confused with the gamma function, frequently used 
in mathematics, which is an extension of the factorial function n! for non-integer 
and complex arguments. It is defined as T(z) = i t?~te~ ‘dt. It has the property 

I'(z+1) = zI (z), and since (1) = 1, we have I (n+1) = n! for natural numbers 


n. Its most well known value for non-integer z is (4) = yT. 
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Definition 4.49 (Weighted quasi-arithmetic mean). Given y €]0, 1| and 
a continuous strictly monotone function g : [0,1] — [—co, 00], the correspond- 
ing bivariate weighted quasi-arithmetic mean is defined as 


My,9(x,y) = g> ((1 — y) g(x) + y9(y)) 
The function g is called a generating function of My g. 


Clearly the function M,(z,y) = x'~7y? is a bivariate weighted quasi- 
arithmetic mean with generating function g(t) = log(t) (see Section [2.3), 
whereas M,(x,y) = (1 — y)x + yy is a bivariate weighted arithmetic mean 
(with the generating function g(t) = t). This observation readily leads to the 
consideration of a wider class of compensatory functions encompassing both 
exponential and linear convex T-S functions: 


Definition 4.50 (T-S function). Given y €]0,1[, a t-norm T, a t-conorm 
S and a continuous strictly monotone function g : [0,1] > [—co, 00] such that 
{g(0), g(1)} 4 {—o0, +00}, the corresponding T-S function is defined as 


Qy.7,89(#1,---,2n) = 97? (a — y) + g(T(a1,..-,2n)) +7- 9(S(a1,--- ite) 


The function g is called a generating function of Qy,7,5,9- 


Note 4.51. An alternative definition allows the parameter y to range over the whole 
interval [0, 1], and thus includes t-norms (when y = 0) and t-conorms (when y = 1) 
as special limiting cases of T-S functions. Observe also that Definition excludes, 
for simplicity, generating functions such that {g(0), g(1)} = {—o0, +00}. 


Note 4.52. Due to their construction, and similarly to weighted quasi-arithmetic 
means (see Section [2.3), different generating functions can lead to the same T-S 
function: in particular, if h(t) = ag(t) +b, where a,b € R, a Æ 0, a simple calculation 
shows that Qy,7,5,n = Qy,7,s,9- 


Exponential convex T-S functions are then nothing but T-S functions with 
generating function g(t) = log(t) (or, more generally, g(t) = alog(t) + b with 
a,b € R, a # 0, see Note £52), whereas linear convex T-S functions are 
obtained from the class of T-S functions by choosing g(t) = t (or g(t) = at+b 
with a,b E€ R, a #0). 

Observe finally that all the above definitions have been given for inputs of 
any dimension n. This is necessary because the corresponding bivariate func- 
tions are not associative. T-S functions may then be used to easily construct 
extended aggregation functions (as defined in Definition on p.|4) sharing 
the same parameters (y, T, S, g) for inputs of any dimension n > 1: 
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Definition 4.53 (Extended T-S function). Given y €]0,1[, a t-norm T, a 
t-conorm S and a continuous strictly monotone function g : [0,1] — [—co, ox] 
such that {g(0), g(1)} 4 {—o«, +00}, the corresponding extended T-S function, 


Qy,7,8,9 : U [0, 1]” — [0,1] 


n€ {1,2,...} 


is defined as Qq r,s g(t) =t for t € [0,1], and, for any n > 1, as 


Quit .se(@1,-+-)2n) = 97" (1-9) - g(F(@1,---. 20) +9: 9(S(@1,---12n))) 


4.5.2 Main properties 


T-S functions are obviously symmetric and not associative. Other interesting 
properties of these functions are summarized below: 


Bounds Any T-S function is bounded by the t-norm and the t-conorm it is 
built upon, i.e., the following inequality holds: 


T< Q+,7,S.9 <S 


Comparison 


e Ify < 72, then Qy,,7,5,9 < Qy2,7,5,9 
e If 7, <T, then Qy,71,5,9 < Q4,T,5,9 
e If Sı < S2, then Qy,7,51,9 < Qy,7.52,9 
Regarding the comparison of T-S functions just differing in their generat- 
ing functions, it suffices to apply the results on the comparison of weighted 
quasi-arithmetic means given in section 2.3] 

Absorbing element T-S functions have an absorbing element if and only if 
one of the two following situations hold: 





e The generating function g verifies g(0) = +00, in which case the ab- 
sorbing element is a = 0. 
e The generating function g verifies g(1) = +00, and then the absorbing 





element is a = 1. 
Note that this entails that exponential convex T-S functions — in partic- 
ular, gamma operators — have absorbing element a = 0, whereas linear 
convex T-S functions do not have an absorbing element. 

Neutral element Neither T-S functions nor T-S functions possess a neutral 
element. 

Duality The class of T-S functions is closed under duality, that is, the dual 
of a function Q,,7,5,g with respect to an arbitrary strong negation N is 
also a T-S function. Namely, it is given by Q1-y,S4,Ta,ga) Where Sa is the 
t-norm dual to S w.r.t. N, Ta is the t-conorm dual to T w.r.t. N and 
ga = 9° N, that is: 


234 4 Mixed Functions 


) = N(S(N(21),...,N(an))) 
Ta(@1,---,%n) = N(T(N(a21),.-..,N(an))) 
galt) = g(N(t)) 


Observe that as a corollary we have the following: 

e The dual of a T-S function with generating function verifying g(0) = 

oo (respectively g(1) = +00) is a T-S function with generating func- 
tion verifying ga(1) = +00 (respectively ga(0) = +00), and thus be- 
longing to a different category. As a consequence, it is clear that these 
functions (in particular, exponential convex T-S functions) are never 
self-dual. 

e When dealing with functions Q4,r,s,g such that g(0), g(1) 4 +00, the 
following result regarding self-duality is available oa 


Proposition 4.54. Let Q,,7,5,9 be a T-S function such that g(0), g(1) 
# +00 and let N be the strong negation generated by g, i.e., defined 
as N(t) = g~'(g(0) + g(1) — g(t)). Then Qy,7,5,9 is self-dual w.r.t. N 
if and only if y = 1/2 and (T, S) are dual to each other w.r.t. N. 


Salz, . --; n 




















The above result can be applied, in particular, to linear convex func- 
tions L} r,s, which appear to be self-dual w.r.t. the standard negation 
N(t) = 1-—t if and only if y = 1/2 and T and S are dual to each other. 
Idempotency T-S functions related to T = min and S = max are obviously 
idempotent, but these are not the only cases where T-S functions turn 
out to be averaging functions (Lsd). For example, if n = 2, it is easy to 
check that the function L1/2,r,s is just the arithmetic mean whenever the 
pair (T, S) verifies the Frank functional equation T(x, y)+ S(x,y) = x+y 
for all (x,y) € [0,1]?. This happens, in particular, when choosing the pair 
(T, S) in the Frank’s families of t-norms and t-conorms (see p. with 
the same parameter, i.e., taking T = TF and S = Sf for some  € [0, ox]. 


Note 4.55. T-S functions of the form Q,y,min,max,g are clearly idempotent for in- 
puts of any dimension, but this is not necessarily the case when non-idempotent 
t-norms and t-conorms are used. For example, the function L1/2,7,,5, is idem- 
potent for n = 2 — the pair (Tp, Sp) verifies the Frank functional equation — 
but it is not idempotent for n = 3. 


Continuity Since their generating functions are continuous and the cases 
where Ran(g) = [—co, +00] are excluded, T-S functions are continuous if 
and only if their associated functions T and S$ are continuous. 


4.5.3 Examples 


Example 4.56. The linear combination of the minimum t-norm and the 
Lukasiewicz t-conorm (see Figure 4.16]for a 3D plot with y = 0.3) was first 
used in i 


4.5 T-S functions 235 


Ly min,S,(1,-+-,n) = (1—) min(21,...,2%n) + ymin (>: Li, 7 : 
i=1 





Fig. 4.16. 3D plot of Lo.3,min,s, (example £56). 


Example 4.57. Taking into account the general result on duality given in sec- 
tion the dual function of a linear convex T-S function w.r.t. a strong 
negation N is a T-S function generated by g(t) = N (t). T-S functions with 
such a generating function are expressed as: 


Open, 2n) = N(( -YN(T(t1,...,2n)) +yN(S(21,..- ,tn))) 


Note that if N is taken as the standard negation N(t) = 1 — t, it is 
Q,,7,5,N = Ly,7,5, i€., Qy,7,5,1-14a is a linear convex T-S function. Other- 
wise, the resulting functions are not linear combinations. To visualize a con- 
crete example, the choice of the strong negation N,(t) = 1 — (1 — t), p> 0 
provides the following parameterized family (see Figure [4.17] for the 3D plot 
of a member of this family obtained with y = 0.5, T = Tp, S = max and 


p = 3): 


Qay1.8,Np (1s +- Pn) =1= (1-7) IT (21, +- ,2n)P HCS (E1. -+3 2n) 


Example 4.58. Similarly to Example the duals of exponential convex 
T-S functions w.r.t. a strong negation N are T-S functions with generating 
function g(t) = log(V(t)). The class of T-S functions generated by log oN is 
given by: 
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Fig. 4.17. 3D plot of Qo.5,7p,max,N3 in example[£.57] (left) and Qo.8,min,sp,log o(1—Id) 
in example (right). 


Qy T SlogoN(T1,.-.;En) = N(N(T, a En) N(S(a1,... 2J)” ): 


It is easy to check that these functions are never exponential convex T-S 
functions. A simple example is obtained with the standard negation, that 
provides the following function (see Figure[4.17] for the 3D plot with y = 0.8, 
T = min and S = Sp): 


Qu T,S1og00 -1d (£1; ve ya) = 1ST ego) IB ist) 


Example 4.59. Choosing gp(t) = tP, p €] — 00, 0[ U ]0,+o0[, the following 
parameterized family of T-S functions is obtained (see Figure [4.18] for a 3D 
plot with y = 0.5, T = Tp, S = Sz and p = 2): 








Qoy7,8,9p(Biy+++42n) = (a — y)(T (t1, . En) +7(S(a1,.-. 2a)P) 


Example 4.60. The choice T = min and S = max provides a wide family of 
idempotent T-S functions (see Figure for a 3D plot with y = 0.2 and 
g(t) = t*): 


Qi xiinmiaxg (C152 2.40%) = (ü — y)g(min(z1,...,£n)) + yg(max(zı,... i): 


4.5.4 Fitting to the data 


Fitting T-S functions to a set of empirical data D involves: a) fitting the 
unknown parameter y; b) fitting the participating t-norm and t-conorm; c) 
fitting the generator g; d) fitting all these parameters together. We start with 
the simplest case of fitting y € [0,1], provided that T, S and g are given (or 
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Fig. 4.18. 3D plot of the T-S function Qo.5,Tp,S},g2 in example [4.59] (left) and 
Qo.2,min,max,g With g(t) = t? in example (right). 


fixed). Consider first linear convex T-S functions in Definition [£48] Let us 
rewrite this expression as 


F(x) = T(x) + y(S(x) — T(x)). 
Using the least squares criterion, we solve the problem 


K 


Minimize,¢(o,1) 5 (Slak) — T(xx)) + T (xk) — yk). 
k=1 


The objective function is convex, and the explicit solution is given by 


K 
2 (Yk — T(Xx)) 
y* = max 4 0, min ¢ 1, a . (4.19) 
È (Sla) — T(x)? 
k=1 
For the exponential convex T-S functions, we need to linearize the outputs 
by taking logarithms, in which case 


log(f(x)) = log(T(x)) + ylog(S(x)) — log(T(x))). 
Let us denote T(x) = log(T(x)), S(x) = log(S$(x)) and 7 = log(y). Then we 
have the minimization problem 
Minimize,¢(0,1] 5 (15 S(xK) — T(xz)) + T (xx) E ik) 
k=1 


and the explicit solution is given by (£19) with T, S and % replacing T, S, y. 
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Let now T and S be members of parametric families of t-norms and t- 
conorms, with unknown parameters pr, ps. Then minimization of the least 
squares criterion has to be performed with respect to three parameters pr, ps 
and y. While for fixed pr,ps we have a convex optimization problem with 
an explicit solution (£19), fitting parameters pr, pg involves a nonlinear op- 
timization problem with potentially multiple locally optimal solutions, see 
Appendix Deterministic global optimization methods, like the Cutting 
Angle method or grid search, usually work well in two dimensions. Thus we 
will solve a bi-level optimization problem 


K 
Minpr psMin,e(o,1) XC (Sps (Xr) — Tor (Xr)) + Tor (Xx) — Yk)” , 
k=1 


where at the outer level we apply a global optimization method, and at the 
inner level we use the explicit formula (4.19). Of course, if T and S are dual 
to each other, the problem simplifies significantly. 

Next, let us fit the generator g, having T, S fixed, but allowing y to vary. 
This problem is very similar to that of fitting generators of quasi-arithmetic 
means in Section[2.3.7) If g is defined by an algebraic formula with a parameter 
Pg, €-g., g(t) = tPs, then we have the following problem 


a 


Minimize, ¢[o,1],p, X (V[9p,(S(xx)) — Ip, (T(xx))] + 9p,(T (xe) — 9p, (Ye) 
k=1 


Again, for a fixed pg the optimal y can be found explicitly by appropriately 
modifying (4.19), while the problem with respect to pg is nonlinear, and a 
global optimization method (like Pijavski-Shubert) needs to be applied. 

Consider now the case when g(t) is not defined parametrically, in which 
case we could represent it with a monotone linear regression spline 


J 
t=) GB 
j=l 


B,(t) being modified B-splines. Then, similarly to weighted quasi-arithmetic 
means, we have the problem of fitting spline coefficients c; and the weight 
y to data, subject to c; > 0. In this case, given that the number of spline 
coefficients J is greater than 1, it makes sense to pose a bi-level optimization 
problem the other way around, i.e., 


K 


J 
(1-— ))- B 
eb a 32 bÈ cj B Dy cj By(T (xx) 2 cj B; (ye) 


Next rearrange the terms of the sum to get 
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2 


ye [0,1] cz>0 


K J 
min min X cy [yBy(S(xx)) + (1 — 1) By (T(x) — By(ye)] 
k=1 \j=1 


Consider the inner problem. The expression in the square brackets does not 
depend on cj for a fixed y, so we can write 


F (Xx, Yk; Y) = YBj(S(xe)) + (1 — y) By (T(xx)) — Bi (ye), 


and 
2 


K J 
2 D ev857) 
This is a standard quadratic programming problem, and we solve it by QP 
methods, discussed in Appendix The outer problem is now a general 
nonlinear optimization problem, with possibly multiple local minima, and 
we use a univariate global optimization method, like the Pijavski-Shubert 
method. 

Finally if we have freedom of choice of all the parameters: y, pr, ps and 
g, then we have a global optimization problem with respect to all these pa- 
rameters. It makes sense to use bi-level optimization by putting either y or cj 
(if g is represented by a spline) in the inner level, so that its solution is either 
found explicitly or by solving a standard QP problem. 


4.6 Symmetric sums 


Symmetric sums (or, more generally, N-symmetric-sums, where N is a strong 
negation) are nothing but self-dual (V-self-dual) aggregation functiond®]. 

Recall from Chapter [I] that strong negations (Definition [48] p. 8) may 
be used to construct new aggregation functions from existing ones by reversing 
the input scale: indeed (see Definition [L54 p. 20), given a strong negation 
N, to each aggregation function f : [0,1]" — [0,1] there corresponds an- 
other aggregation function fa : [0,1]” — [0,1], defined as fa(ai,...,¢n) = 
N(f(N(a1),...,N(an))), which is called the N-dual of f. 

Recall also that the classes of conjunctive and disjunctive functions are 
dual to each other, that is, the N-dual of a conjunctive function is always 
a disjunctive one, and vice-versa. It is also easy to chec] that the two re- 
maining classes of aggregation functions are closed under duality, i.e., the 
N-dual of an averaging (respectively mixed) function is in turn an averaging 


19 The original definition in included two additional axioms — continuity and 
symmetry — but these restrictions are normally no longer considered (see, e.g., 
ia) 

20 Tt suffices to notice that for any strong negation N, min and max are N-dual to 
each other, and that f < g implies ga < fa. 


240 4 Mixed Functions 


(respectively mixed) function. This means that N-self-dual aggregation func- 
tiond2)] (i.e., those such that fa = f, see Definition [1.55) can only be found 
among averaging and mixed functions. Some N-self-dual functions, belonging 
to the main families of averaging and mixed functions, have been identified in 
different sections of Chapters] and [4] In particular, we have: 


e Weighted quasi-arithmetic means with bounded generating function g are 
N-self-dual with respect to the strong negation N(t) = g~!(g(0) + g(1) — 
g(t)). Particularly, any weighted arithmetic mean is self-dual with respect 
to the standard negation (see Section 22.3.2). 

e OWA functions with symmetric weighting vectors are self-dual with re- 
spect to the standard negation (Section 2.5.2). 

e Nullnorms Vr,s,a, where a is the negation’s fixed point (i.e., the value such 
that N(a) = a) and T and S are dual with respect to the strict negation 
N(t) = Nea) a In particular, a-medians Vmin,max,a are N-selfdual with 
respect to any N with fixed point a (Section 3.2). 

e T-S-functions Qo.5,7,5,g such that N(t) = g~'(g(0)+g(1)—g(t)) and (T, S) 
are dual to each other w.r.t. N (Section 5.2). 


The first studies of this kind of aggregation functions date back to 1980-s 
[zs [1901 2271, but their applications to the solution of different decision 
problems — such as preference modeling [1021 or multicriteria decision 
making 162- have renewed the interest in them (see recent publications [104, 


(160) ke1 ke5} [193)). 


4.6.1 Definitions 


In order to define symmetric sums — N-symmetric sums — it suffices to re- 
cover the definition of self-dual aggregation function given on page BO} Let us 
first consider the particular case of self-duality with respect to the standard 
negation: 


Definition 4.61 (Symmetric sum). A symmetric sunl is an aggregation 
function f : [0,1]" — [0,1] which is self-dual with respect to the standard 
negation, i.e., which verifies 


f(a1,-.--,@n) = 1— f(l— z1,..., 1 — £n) 
for all (a1,...,;%n) € [0,1]”. 


The above definition can be easily generalized to the case of arbitrary 
strong negations in the following way: 


21 Also known as N-invariant aggregation functions. 
22 Symmetric sums are also sometimes called reciprocal aggregation functions [od]. 
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Definition 4.62 (N-symmetric sum). Given a strong negation N, a N- 
symmetric sum is an aggregation function f : [0,1]" — [0,1] which is self-dual 
with respect to N, i.e., which verifies 


Flai... En) = N (FN (21), -, N(2n))) (4.20) 


for all (£z1,..., £n) € [0,1]”. 


It is important to notice that, despite their name, N-symmetric sums are 
not necessarily symmetric in the sense of Definition [-16] For example, 
weighted arithmetic means — excluding the special case of arithmetic means — 
are prototypical examples of non-symmetric self-dual aggregation functions. 


4.6.2 Main properties 


N-symmetric sums are of course characterized by the N-self-duality equation 
(420), but this equation does not provide any means for their construction. 
More useful characterizations, showing how to construct N-symmetric sums 
starting from arbitrary aggregation functions, are available. The first of these 
characterizations can be stated as follow} 


Proposition 4.63. Let y : [0,1] — [0,1] be an automorphism and let N be 
the strong negation generated by g4. A function f : [0,1]” — [0,1] is a N- 
symmetric sum if and only if there exists an aggregation function g : [0,1]" > 


[0, 1] such that 


— g(@1,.--,2n) 
faisait ) 
g(zı, ee En) +g(N (z1), Heg , N(an)) 
with convention 2 = 4. The function g is called a generating function of f. 


Proof. The sufficiency of the above characterization is obtained by choosing 
g = po f, and the necessity is a matter of calculation. 

Note 4.64. The situation 2 will occur if and only if g(x1,...,¢%n) = 0 and 
g(N(2#1),-..,N(an)) = 0, and for such tuples f(a1,...,an) = y (1/2), i.e., the 
value of the aggregation coincides with the negation’s fixed point (see Remark 
1.53). This will happen, for example, in the case of generating functions with 
absorbing element 0 and tuples (x1,...,%n) such that min(z1,..., £n) = 0 and 
max(%1,...,0n) = 1. 


A characterization for standard symmetric sums is then obtained as a 
simple corollary of Proposition [4.63] choosing N(t) = 1 — t: 
23 See [s3], or and for similar results. 


24 That is, N(t) = No (t) = y7 + (1 — v(t) (see characterization L51] on p. 8). 
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Corollary 4.65. A function f : [0,1]" — [0,1] is a symmetric sum if and 
only if there exists an aggregation function g : [0,1]" — [0,1] such that 
g(@1,---,;%n) 
Lig ees Ly) = > a a aaea 
f(z n) g(@1,---;%n) +g(1l— z1,..., 1 — £n) 
; spol 
with convention 5 = 3- 


Note 4.66. When using the above characterization for building new symmetric sums, 
the generating function g cannot be itself self-dual, because this would imply f = g. 


A different characterization of N-symmetric sums is the following?" 


Proposition 4.67. Let y : [0,1] — [0,1] be an automorphism and let N be 
the strong negation generated by p. A function f : [0,1]" — [0,1] is a N- 
symmetric sum if and only if there exists an aggregation function g : [0,1]" > 


(0, 1] such that 


fhtin Tn) = 7! o 


g(N(z1),... — 
. 


Proof. Similar to the proof of Proposition [4.63] 


Again, we obtain, as a corollary, the following method for constructing 
symmetric sums: 


Corollary 4.68. A function f : [0,1]" — [0,1] is a symmetric sum if and 
only if there exists an aggregation function g : [0,1]" — [0,1] such that 


g(@1,---,%n) + 1— g(1— z1,...,1— Tn) 
2 


Note 4.69. As it happens in the case of Corollary [4.65] (see Note [4.66), choosing a 
self-dual generating function g would not provide any new symmetric sum. 


[Liss en) = 


Note 4.70. Note that the two characterizations given in Propositions [4.63] and [4.67] 
are similar in that they both are of the form 


f(@1,..-,%n) = h(g(a1,..-, Un), ga(@1,---,Ln)) (4.21) 


that is, N-symmetric sums f are in both cases obtained as the composition (see 
Section [1.5) of an aggregation function g and its N-dual gą by means of a bivariate 
aggregation function h. Indeed, this can be easily checked choosing 


h(a,y) =~" (=) 


to obtain Proposition [£.63] (h(x, y) = for Corollary [£65) and 


ae Pa y 


25 This is just a generalization of the characterization given in [og 
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A(z,y) = p7! [>] 
for Proposition (the arithmetic mean for Corollary [4.68). Moreover, these two 
bivariate functions are not the only ones allowing to build N-symmetric sums from 
equation (£21). Actually, any bivariate aggregation function h : [0,1]? — [0,1] 
verifying h(a, y) = N(h(N (y), N(x))) and reaching every element of [0, 71 (1/2)[ is 
suitable i A simple example of such a function h is given by the quasi-arithmetic 
mean generated by y, i.e., the function 


h(z,y) =e (eee) l 


Note 4.71. Observe that if g is itself N-self-dual and h is idempotent, then the 
construction method given in equation (4.21) is not useful, since it ends up with 
f = (this happens, in particular, in the case of Corollaries 4.65] and 4.68] see Notes 


[06] and L0). 


Note 4.72. The generating functions of Propositions [4.63] and [4.67] are not necessar- 
ily unique, that is, different functions g can lead to the same N-symmetric sum. 
A straightforward example of this appears in the case of Corollary [4.68] where any 
aggregation function g generates the same symmetric sum as its dual function 92), 
Thus, for each characterization, generating functions can be grouped into equiva- 
lence classes, each one containing all the aggregation functions that generate the 
same N-self-dual functio]. 


Other interesting properties of N-symmetric sums are the following: 


Bounds The bounds of some classes of N-symmetric sums, namely, those 
built as in Note 4.70) f = h(g, ga), with the additional condition of the 
idempotency of h, are easily calculated. Indeed, if h is idempotent, it is 
min(x, y) < h(x, y) < max(z, y), and, consequently: 


min(g(x), ga(x)) < f(x) < max(g(x), ga(x)). 


Note that the above inequality holds, in particular, in the cases of Corol- 
laries [4.65] and [4.68] since both use idempotent functions h. 

Symmetry Recall from Section 4.6.1] that N-symmetric sums do not need 
to be symmetric. Nevertheless, when n = 2 and f is a symmetric N- 
symmetric sum, the following special property holds: 


f(t, N(t))=tẹ Vet € [0,1], 


where ty is the negation’s fixed point. This implies, in particular, f(0,1) = 
f(1,0) = tn. When N is taken as the standard negation, the above equa- 
tion is written as f(t,1 — t) = 4. 

26 Note that this does not happen in the case of Corollary [£65] 


7 For more details on this, see [161]. 
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Absorbing element N-symmetric sums may or may not possess an absorb- 
ing element, but if they do then it must necessarily coincide with ty, 
the negation’s fixed point?) This is the case, for example, of N-self-dual 
nullnorms, as it was already pointed out in Section [4.3.2] 

It is not difficult to explicitly build N-symmetric sums with the absorb- 
ing element ty. For example, both Corollary [4.65] and Corollary [£4.68] 
when used with generating functions having absorbing element 1/2, pro- 
vide symmetric sums with the same absorbing element. More generally, 
it is easy to check that the construction h(g, ga) mentioned in Note [4.70] 
when used with a generating function g with absorbing element ty and a 
function A that is, in addition, idempotent, leads to an N-symmetric sum 
with absorbing element ty. If y is the generator of N, an example of this 
may be obtained by choosing h as the quasi-arithmetic mean generated 
by y, and g as any nullnorm with absorbing element ty = y~1(1/2). 

On the other hand, the fact that the absorbing element must necessarily 
coincide with ty excludes any aggregation function with absorbing ele- 
ment in {0,1}, such as: weighted quasi-arithmetic means with unbounded 
generating functions, e.g., the geometric and harmonic means (see Section 
[2.2); uninorms (Section [4.2); T-S-functions (Section[4.5) with unbounded 
generating functions, e.g., any gamma operator (Definition [Z.46). 

Neutral element Similarly to the absorbing element, if an N-symmetric 
sum has a neutral element, then it is the negation’s fixed point. Observe 
that any N-symmetric sum built as in Note[4.70]by means of a generating 
function having neutral element ty and an idempotent function h, has the 
same neutral element ty. Examples of N-symmetric sums with neutral 
element built in a different way can be found below in Section[Z.6.3] 

Idempotency Clearly, some N-symmetric sums are idempotent (those be- 
longing to the class of averaging functions) while others are not (the mixed 
ones). For example, corollaries and [4.68]easily end up with idempo- 
tent symmetric sums as long as an idempotent generating function is used. 
This is not always the case with the characterizations given in Proposi- 
tions [4.63] and [4.67] but it is true when using the construction h(g, ga) of 
Note [4.70] with g and h idempotent. 

Shift-invariance Shift-invariant (Definition[.45) symmetric sums have proved 
their usefulness [103]. Such functions can be built, in particular, by means 
of Corollary [4.68] starting from an arbitrary shift-invariant generating 
function g 


4.6.3 Examples 


Some prominent classes of N-symmetric sums are presented below. 


28 The proof of this is immediate from the definition of f and the uniqueness of the 
fixed point of strong negations (see Remark on p. POJ. 
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T-norm-based N-symmetric sums 


The two propositions given in the previous section — or, more generally, the 
construction method mentioned in Note[4.70] — along with the corresponding 
corollaries for the case of the standard negation, constitute powerful tools for 
building N-symmetric sums of different kinds and with different properties. 
Let us see what happens, in particular, when choosing a t-norm (Chapter B) 
as a generating function [|83]. 

Given a strong negation N = N, with fixed point ty, Proposition 63) 
choosing g = T where T is a t-norm, provides the N-symmetric sum defined 


as 
A (tigin 


oe. 
Mieta = (z= en) + TNE). A. 
whenever it is T(£1,..., £n) #0 or T(N(z1),..., N(£n)) #0, and 
ftira) =tN 
otherwise. 
Example 4.73. T = min provides the following idempotent N-symmetric sum: 
eane got (att) , if min(z;) #0 or max(z;) £1, 
tn otherwise. 


In particular, when N is the standard negation, Corollary [4.65] allows one 
to build t-norm-based symmetric sums given by 


T (Biges Cri) 


LPs sn) = Fe) ETAETA] 


whenever it is T(#1,...,%) #0 or T(1 — z1,..., 1 — £n) #0, and 


Ff (Bites ani) = 1/2 


otherwise. Observd2)| that these functions verify the inequality T < f < Ta 
and, consequently, are idempotent when choosing T = min. 


Example 4.74. When n = 2, the following min-based bivariate symmetric sum 
is obtained (see Figure [4.19] for a 3D plot): 


f(z,y) = Tet, if {x,y} # {0,1}, 


4 otherwise. 





29 See the note on bounds in the previous section. 
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Fig. 4.19. 3D plots of the min-based idempotent symmetric sum in Example [4-74] 
and Tp-based symmetric sum in Example [4.75] 


Example 4.75. Choosing the product t-norm Tp as the generating function in 
Corollary |4.65} the following symmetric sum is obtained: 


f (21, 22,...,2n) = ———=__, 
II 2% + [[(- z) 
i=1 i=l 
with the convention 2 = 4. Except for the tuples (x1,..., £n) such that 
0 2 


{0,1} C {z£1,...; £n}, that verify f(£1,...,£n) = $, this function coincides 
with the representable uninorm known as the 3 — IJ function (Example [4.19] 


in p. 209). 


Note now that Proposition [Z.67Jalso allows one to construct t-norm-based 
N-symmetric sums given by 


Fwy) yet e otras Nies)) 


where T is an arbitrary t-norm. In the case of the standard negation (i.e., 
when applying Corollary [£.68) the above functions become symmetric sums 
of the form 


T(a1,---;%n) + Ta(#1,---,2n) 
f(#1,- nan) = e rs 


which are nothing but the arithmetic mean of a t-norm and its dual t-conorm, 
that is, linear T-S-functions L1/2,r,r, (see Definition 4.48). The choice T = 
min recovers the OWA function (see Definition [. 2) with weights w1 = wn = 
1/2 and w; = 0 otherwise. Recall also (see the idempotency item in Section 
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(4.5.2) that when n = 2, the choices T = min, T = Tp and T = Ty, all lead to 
the arithmetic mean, since the dual pairs (min, max), (Tp, Sp) and (Tz, Sr) 
verify Frank’s functional equation T(x, y) + S(x,y) = £ + y. 


Example 4.76. Choosing the Schweizer-Sklar t-norm with parameter À = 2 
(see p.[150) the following bivariate t-norm-based symmetric sum is obtained 
(see Figure [4.20] for a 3D plot): 








jg eS ee Se ee 





Fig. 4.20. 3D plot of the Schweizer-Sklar t-norm-based symmetric sum given in 
Example [4.76] (left) and of the max-based symmetric sum given in Example [4.77] 
(right). 


T-conorm-based N-symmetric sums 


The t-norms used in the previous section can of course be replaced by their 
dual functions, t-conorms. If N = Nọ is a strong negation and S is an 


arbitrary t-conorm, Proposition [£63] provides the N-symmetric sum defined 


as 
S(z1,. bi , Ln) 


gat 
Flee) = 0 (ee) 
When N is taken as the standard negation, Corollary [4.65] provides t- 
conorm-based symmetric sums of the form 
S(a1, Ewi Bia) 


ti) ee E 


which verify Sa < f < S and are idempotent whenever S = max. 
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averaging 
min< f<max 


averaging 
mins f<max 





Fig. 4.21. The structure and a 3D plot of the Sp-based symmetric sum given in 
Example[4.77] 


Example 4.77. When n = 2 and N(t) = 1 — t, the following max-based and 
S'p-based bivariate symmetric sums are obtained (see Figures[4.20] (right) and 
[4.21]for 3D plots of both functions): 


ran = ame 





_  a@t+y-zy 
On the other hand, Proposition leads to N-symmetric sums of the 
form 


Flts in) = 


a (Shes) #1— SUG). MG) | 


When dealing with the standard negation, these functions coincide with the 
ones generated by their dual t-norms (see Note 4.72), that is, they are linear 
T-S functions. 


Almost associative N-symmetric sums 


If N is a strong negation with fixed point ty and u : [0,1] — [—co,+co] is a 
strictly increasing bijection, such that u(tv) = 0 and verifying 
u(N(t)) + u(t) = 0 for all t € [0,1], then the function given by 


f(a1,...,0n) = ut e ue) , if {0,1} É {a,...,¢n}, 


=1 
tn otherwise 


S. 
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is a N-symmetric sum with neutral element ty which is, in addition, asso- 
ciative and continuous except for the tuples simultaneously containing the 
values 0 and 1. Note that this kind of N-symmetric sums coincide with repre- 
sentable uninorms (Section [4.2.3) except for the tuples (z1, ..., £n) such that 
{0,1} C {a1,..., an}. 


4.6.4 Fitting to the data 


Fitting symmetric sums to the data is based on their representation theorems 
(Propositions [4.63] and [4.67). Typically one has to identify the parameters of 
the participating aggregation function g, which could be a quasi-arithmetic 
means or a t-norm or t-conorm. It is a nonlinear global optimization problem, 
which in the case of one or two parameters can be solved using Cutting Angle 
method (see Appendix [A.5.5). 

Special classes of symmetric sums are some means, T-S functions and 
(up to the inputs (x1,..., £n) such that {0,1} C {a1,...,an}) representable 
uninorms. Therefore special methods developed for these types of functions 
in Sections [2.3.7] [4.2.6] and [4.5.4] can all be applied. 


4.7 ST-OWA functions 


Ordered Weighted Averaging (OWA) functions (see Chapter P) have recently 
been mixed with t-norms and t-conorms (Chapter) in order to provide new 
mixed aggregation functions known as T-OWAs, S-OWAs and ST-OWAs. 
These functions have proved to be useful, in particular, in the context of 


multicriteria decision making (311 paa, 273}. 


4.7.1 Definitions 


Recall from Chapters [I] and 2] that given a weighting vector 24 w = (w,---, 
wn) € [0, 1)", an OWA is an aggregation function defined as 


OW Ay(21,..-,8n) = >) wi zo) (4.22) 
i=l 
where X() = (%(1),---+;2(n)) is the vector obtained from x by arranging its com- 


ponents in non-increasing order, i.e.,.x;) = Xx. Due to this ordering, it is clear 
that for any i € {1,...,n}, eq) = min(za),--- L) = max(zq),...,2(n)), 
and then Equation (422) can be rewritten either as 


OW Ay (@1,.--,; Ln) = 5 wi : min(za), ---, L) ), (4.23) 
i=1 


30 Definition [L66]on p. 
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or, equivalently, as 
OW Aw(@1,---,2n) = 5 Wi mMax(TG),.++,2(q))- (4.24) 
i=1 


Since the min function that appears in Equation (423) belongs to the 
class of t-norms (see Chapter B), a simple generalization of OWA functions is 
obtained by replacing this minimum operation by an arbitrary t-norm T: 


Definition 4.78 (T-OWA function). Let w € [0,1]” be a weighting vector 
and let T be a t-nornl4. The aggregation function Orw : [0,1]" — [0,1] 
defined as 


Orw(«1, ss ein) = Xu “T(E ae Ei) 
i=1 


where (2(1),---,2(n)) = XX, is called a T-OWA. 
Note 4.79. The following special cases should be noted: 


e Ifw=(1,0,...,0), then Orw = OW Aw = max. 
e Ifw=(0,...,0,1), then Orw =T. 


Note 4.80. Bivariate T-OWA functions are just linear convex T-S functions (Defini- 
tion (4.48) with S = max, that is, when n = 2, OT,w = Lwy,T,max- 


Of course, OWA functions are T-OWAs obtained when choosing T = min, 
that is, OW Aw = Omin,w for any weighting vector w. Similarly, the max 
function in (4.24) can be replaced by an arbitrary t-conorm S, and this results 
in a S-OWA: 


Definition 4.81 (S-OWA function). Let w € [0,1]” be a weighting vector 
and let S be a t-conorm. The aggregation function Os w : [0,1]" — [0,1] 
defined as 


Os,w(@1, ae hEn) = Sow; ` S(zw, eis Eln) 
i=1 


where (£0), ---,£(n)) = XN, is called an S-OWA. 
Note 4.82. The following special cases should be noted: 


e Ifw=(1,0,...,0), then Os,w = S. 
e Ifw=(0,...,0,1), then Os,w = OW Aw = min. 


31 Recall from Chapter Blthat t-norms are bivariate functions which are associative, 
and are, therefore, uniquely determined for any number of arguments (with the 
convention T(t) = t for any t € [0, 1]). 
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Note 4.83. Similarly to the case of T-OWAs (Note[4.80), bivariate S-OWA functions 
are nothing but linear convex T-S functions (Definition [4.48) with T = min, i.e., 
when n = 2, Os,w = Lw,,min,s- 


Again, OWA functions are just particular cases of S-OWAs (obtained by 
choosing S = max), so T-OWAs and S-OWAs both generalize OWA functions. 
Note, however, that they do it in opposite directions, since (see next section) 
T-OWAs are always weaker than OWAs whereas S-OWAs are stronger, that is, 
the inequality Orw < OW Aw < Os.w holds for any weighting vector w, any 
t-norm T and any t-conorm S. This observation, along with arguments rooted 
in the decision-making’s field 245}, leads to the following generalization. 


Definition 4.84 (ST-OWA function). Let w € [0,1]” be a weighting 
vector, o the attitudinal character (orness measure >>| of the OWA function 
OW Aw, T a t-norm and S a t-conorm. The aggregation function Os rw : 
(0, 1]” — [0,1] defined as 


Os. T,w(£1, aus sr) = Xu . (a — o)T (£a), sh ,£(;)) + oS (Eei), ohare ,2(n))); 
i=l 


where (%(1);---,%(n)) = XX, is called an ST-OWA function. 


Note 4.85. Similarly to the cases of T-OWAs and S-OWAs, the following special 
weighting vectors are to be noted: 


e Ifw=(1,0,...,0), then Os,rw = S. 
e Ifw=(0,...,0,1), then Os,T,w =T. 


Note 4.86. Definition 4.84] could clearly have been written as 
Os,T,w(£1,.--, £n) = (1 — 0): OT,w(21,..., £n) +0- Os,w(£1,..., En), 


so ST-OWAs are just linear convex combinations of a T-OWA and a S-OWA, which 
become simple OWAs in the limiting case T = min and S = max (i.e., Omax,min,w = 


OW Aw). 


4.7.2 Main properties 


T-OWAs, S-OWAs and ST-OW4As are obviously symmetric aggregation func- 
tions. Some other interesting properties are listed below: 


Comparison, Ordering and Bounds 


32 Recall that the orness measure of OW Aw is given (see Definition E2) by 


orness(w) = P; wi: (+4). 
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T-OWAS. For any t-norms T; and Ts such that Ti < T it is Or, w < 
On, w- 

This property, which is easily verified, provides the following inequali- 
ties, obtained from the basic ordering properties of t-norms (see Chap- 


ter B): 


OTp,w < Or, .w < Orp.w < Omin,w = OW Aw, 
OTp,w < OT,w < Omin,w = OW Aw. 


On the other hand, it is not difficult to prove that any t-norm is weaker 
than the T-OWA built from it, so the following lower bound may be 
established for any T-OWA: 


T < OT w. 


S-OWAs. For any t-conorms Sı and S2 such that Sı < Sə it is 
Os, w < Os,,w; and then: 


OW Aw = Omax,w < Osp.w < Os, .w < Usp: 
OW Aw = Omax,w < Os.w < Osp,w- 


Similarly to T-OWAs, the following upper bound applies for any S- 
OWA: 
Os.w < S. 


ST-OWAs. Since ST-OWAs are linear convex combinations of their 
corresponding T-OWA and S-OWA functions (see Note [Z.86), it is 


OT,w < Os.T.w < Os.w: 
which of course implies 


T < Os.T.w < S. 


Attitudinal Character (orness measure) Recall from Chapter P] Defini- 


tionß.2]that each OW Aw has an associated number in [0,1] known as the 
attitudinal character or orness value, which measures the distance from 
the minimum function, and which may be calculated using the OWA func- 
tion itself as follows: 





n—2 = n—i 
orness(w) = OW Aw (1...0) =a u (=) 


This idea can be generalized to the case of T-OWAs, S-OWAs and ST- 
OWAs as follows: 
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T-OWAs. The orness value of a T-OWA Or w, denoted by orness 
(T, w), is a value in [0,1] given by 


—2 
orness(T, w) = Orw (i aa etd 0) 
n—-1 


n n—i 
= wet (iF), 


Note that orness(T, (1,0,...,0)) = 1 and orness(T, (0,...,0,1)) =0. 
Also, the fact that Or,w < OW Aw implies orness(T,w) < orness(w), 
that is, T-OWAs cannot be more disjunctive than OWA functions. 
S-OWAs. The orness value of a S-OWA Os,w is a value in [0,1] de- 
noted by orness(S,w) and given by 





—2 
orness(S,w) = Os.w (2...0) 
n= 
7 n—i 


Similarly to the case of T-OWAs, it is orness(S,(1,0,...,0)) = 1 and 
orness(S,(0,...,0,1)) = 0, and, because of the inequality OW Aw < 
Os.w, it is orness(w) < orness(S,w), that is, S-OWAs are at least as 
disjunctive as OWA functions (and, of course, more disjunctive than 
T-OWAs). 

ST-OWAs. In analogy to T-OWAs and S-OWAs, the orness value of 
a ST-OWA is a value orness(S,T,w) € [0,1] that is computed as 


ae 


= (1 — o) - orness(T, w) +0: orness(S,w). 


—2 
orness(S,T,w) = Os.r.w (i — av 0) 


Clearly, orness(S,T,(1,0,...,0)) = 1, orness(S,T, (0,...,0,1)) = 0, 
and, in general, orness(T, w) < orness(T, S,w) < orness(S,w). 


Absorbing element We have seen in Section [4.7.1] that T-OWAs, S-OWAs 
and ST-OWAs coincide, in some limiting cases, with t-norms and t- 
conorms. This entails that, at least in these cases, they will possess an 
absorbing element in {0,1}. Namely: 


When w = (1,0,...,0), since Orw = max and Osw = Os, rw = S, 
these three functions have a = 1 as absorbing element. 

When w = (0,...,0,1), the corresponding T-OWAs, S-OWAs and 
ST-OWAs have a = 0 as absorbing element, since Os,w = min and 
Orw = Os,rw =T. 


On the other hand it is easy to prove that these are the only cases where 
T-OWAs, S-OWAs and ST-OWAs possess an absorbing element. 
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Continuity S-OWA, T-OWA and ST-OWA are continuous if the participat- 
ing t-norm and t-conorm T and S are continuous. Furthermore, if T and 
S are Lipschitz (for any number of arguments), then S-OWA, T-OWA and 
ST-OWA are also Lipschitz with the same Lipschitz constant. 

Neutral element Similarly to the case of absorbing element, T-OWAs, S- 
OWAs and ST-OWAs only possess a neutral element in some special lim- 
iting cases. More precisely, a T-OWA (respectively, S-OWA, ST-OWA) 
has a neutral element if and only if one of the following situations holds: 
e w= (1,0,...,0), in which case it is Or,w = max (respectively Os. w = 

Os,T,w = S) ande=0. 
e w= (0,...,0,1), in which case it is Orw = T (respectively Os w = 
min, Os.rw = T) ande=1. 

Duality When using the standard negation N(t) = 1 — t, the classes of T- 
OWAs and S-OWAs are dual to each other: 

e The dual function of a T-OWA Or,w is the S-OWA Os, œ where Sq is 
the t-conorm dual to T (i.e., Sa(a1,.--,%n) = 1—-T(1—a1,...,1—2n)) 
and wW is the reversed of w, that is, Ù; = Wn—j+1 for any j € {1,...,n}. 

e The dual function of a S-OWA Os,w is the T-OWA Or, œ where Ty is 
the t-norm dual to § and wW is, as previously, the reversed of w. 

Note that the attitudinal character of a T-OWA and its dual S-OWA are 

complementary, that is 


orness(T, w) = 1 — orness( Sa, W), 
orness(S,w) = 1 — orness(Ta, W). 


Regarding ST-OWAs, it is easy to check that this class is closed under 
duality w.r.t. the standard negation, that is, the dual of a ST-OWA Os,T,w 
is in turn a ST-OWA, given by Os,,7,,@- This allows one to find self-dual 
ST-OWAs: indeed, any ST-OWA Os.7.w such that (T, S) is a dual pair 
and w is symmetric (i.e., it verifies w = W) is self-dual. The attitudinal 
characters of a ST-OWA and its dual are complementary, that is: 


orness(S,T,w) = 1 — orness( Sa, Ta, W) 


The latter entails that self-dual ST-OWAs have orness value 1/2. 


4.7.3 Examples 


Example 4.87 (T-OWA with T = Tr). When using the Lukasiewicz t-norm 
Tz, the following T-OWA is obtained 


OF: wiesst) = So wi - max 0, X zo) — (i — 1) 
i=1 j=l 


When n = 2 the above function may be written as follows (see Figure[4.22] 
for a 3D plot with weighting vector w = (0.3, 0.7)) 
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w : max(x, y) + (1 — wi) : max(0, £ + y — 1). 


OF, wit, y) 





Fig. 4.22. 3D plots of the T-OWA Or,,(0.3,0.7) (example £87) and S-OWA 


Osp,(0.3,0.7) (example [4.88). 


) 


p= Jc — 2(;)) 
jai 


When n = 2 the above function may be written as follows (see Figure[4.22] 


for a 3D plot with weighting vector w 


n 
i=1 


Example 4.88 (S-OWA with S = Sp). When using the probabilistic sum 
Ospa lTi as 


t-conorm Sp, the following S-OWA is obtained 


<n) 


w(X1, is 


(i 


+oa-Osp, 


w 1) + min(a, y). 
Sp). Consider now the fol- 


) 


>In 


(0.3, 0.7)) 
zy) +(1 


i(a@+y—ay) +(1—wi)? max(0, «+y—1). 


Tr and S 
(0.3, 0.7)) 





1 — ø); Or, w(a1,.-. 
Hy)+ 





an= 
wi(1 w1)( 


Osp.w(t,y) = wi (£ +y 


Example 4.89 (ST-OWA with T 
T15-- 


( 


WwW 
In the bivariate case the above expression can be simplified as follows (see 


lowing ST-OWA, built from the Lukasiewicz t-norm and the probabilistic sum 
Osp.Tr, 


t-conorm 
Figure [4.23] for a 3D plot with w 


Osp,T,,.w(2; y) 
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Fig. 4.23. 3D plot of the ST-OWA Osp,T;,(0.3,0.7) (example 4.89). 


4.7.4 U-OWA functions 


Consider now a different generalization of S- and T-OWA functions, which 
involves a value e € [0,1] used to separate the domain into a conjunctive, 
disjunctive and averaging parts, shown on Fig. on p. ROI The purpose 
is to obtain a family of aggregation functions which exhibit (depending on 
the parameters) either averaging or conjunctive behavior on [0,e]”, either 
averaging or disjunctive behavior on [e, 1]”, and averaging behavior elsewhere. 
The behavior of f on the subsets [0, e]” and [e, 1]” is similar to that of T-OWAs 
and S-OWAs respectively. Our construction involves the value e € [0,1], the 
weighting vector w and scaled versions of a T-OWA and S-OWA. 

We recall (see Section 4.2) that a uninorm U behaves like a scaled t-norm 
on [0, e]”, scaled t-conorm on |e, 1]” and is averaging elsewhere. U has neutral 
element e €]0,1[ and is associative. Because of the similarity of the behavior 
of the function defined in Definition |4.90] with that of uninorms, we call it 
U-OWA. But we note that it does not possess a neutral element (except the 
limiting cases) nor it is associative. 


Definition 4.90 (U-OWA function). Let w € [0,1]” be a weighting vec- 
tor, e € [0,1], and let T and S be a t-norm and t-conorm respectively. The 
aggregation function Or,s,e,w : [0,1]” — [0,1] defined as 


Or,w(x), if x € [0,¢]”, 
OT, 8,e,w(X) = Os.w(X), if x € le, 1)”, 
OW Aw(x) otherwise, 


where Orw and Os.w are scaled versions] of T-OWA and S-OWA, is called 
a U-OWA. 


33 We remind that scaled versions are obtained as Or,w(x) = e - Orw(S,..., ) 


and Os,w(x) =e+ (1-— e): Os,w(,..., Z). 
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Note 4.91. The following special cases should be noted: 


If e € {0,1} we obtain S-OWA and T-OWA respectively. 
If w = (1,0,...,0), then Or,s,ew = (< e,1,S >), an ordinal sum t-conorm, 
compare to Proposition [3.133] 

e If w = (0,...,0,1), then Or,s,ew = (< 0,e,T >), an ordinal sum t-norm; if 
T =Tp then it belongs to Dubois-Prade family. 


Properties of U-OWA follow from those of S-OWA and T-OWA and include 


Measure of orness orness(T, S,e,w) = Or,s,ew(1, a Saran) s 
orness(T, S,e, (1,0,...,0)) = 1, orness(T, S,e, (0,0,...,1)) = 0; 
Or,s,e,w has a neutral element (not coinciding with e) if and only if w = 
(1,0,...,0) (the neutral element is 0) or w = (0,...,0,1) (the neutral 
element is 1); 

e Ifthe underlying t-norm and t-conorm are min and max, Omin,max,e,w = 
OW Aw; 

e For n > 2 Or.s,e,w is continuous only in one of the limiting cases: a) 
OW Aw, b) continuous S-OWA (e = 0), c) continuous T-OWA (e = 1), d) 
w = (1,0,...,0) and a continuous S$ , e) w = (0,...,0,1) and a continuous 
T; 

e Any U-OWA is symmetric. 


4.7.5 Fitting to the data 


We consider various instances of the problem of fitting parameters of ST- 
OWA to empirical data. As in earlier sections, we assume that there is a set 
of input-output pairs {(x,, Yk)}, k = 1,... K, with x, € [0, 1]”, yx € [0, 1] and 
n is fixed. Our goal is to determine parameters S, T, w which fit the data 
best. 


Identification with fixed S and T 
In this instance of the problem we assume that both S and T have been 
specified. The issue is to determine the weighting vector w. For S-OWA and 


T-OWA, fitting the data in the least squares sense involves solution to a 
quadratic programming problem (QP) 


K n 2 
Minimize ` (= pS (Bayes eae) — w) (4.25) 
k=1 \i=1 


n 
s.t. yw = 1, 2 0, 


and similarly for the case of T-OWA 
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K n 2 
Minimize 5° (= OE (Eyki Lak) = v) (4.26) 
k=1 \i=1 


Me 


s.t. w; = 1 w; > 0. 


t=1 


We note that the values of S and T at any x, are fixed (do not de- 
pend on w). This problem is very similar to that of calculating the weights 
of standard OWA functions from data (p. Z7) but involves fixed functions 
S(£(a)ky +++) L (mye) and T(£0)k;---, 2a)k) rather than just £q)g- 

If an additional requirement is to have a specified value of orness(w, S) 
and orness(w, 7), then it becomes just an additional linear constraint, which 
does not change the structure of QP problem or (426). 

Next, consider fitting ST-OWA. Here, for a fixed value of orness(w) = Ø, 
we have a QP problem 


K n 2 
Minimize S> (= wi ST (Xz, o) — w) (4.27) 
k=1 \i=1 
s.t. Sw; = 1, w; => 0, 


1 
orness(w) =o, 


uv 


where 
ST(x, 0c) = (1 = o)T (x1), ides Efi) + oS (Tii), DE he). 


However o may not always be specified, and hence has to be found from 
the data. In this case, we present a bi-level optimization problem, in which 
at the outer level nonlinear (possibly global) optimization is performed with 
respect to parameter g, and at the inner level the problem (426) with a fixed 
g is solved 

Minimize sejo,1] [F(o)], (4.28) 


where F(c) denotes solution to (£27). 

Numerical solution to the outer problem with just one variable can be 
performed by a number of methods, including grid search, multistart local 
search, or Pijavski-Shubert method, see Appendix [A.5.5) QP problem 
is solved by standard efficient algorithms, see Appendix [A.5] 


Identification of T-OWA and S-OWA 


Consider now the problem of fitting parameters of the parametric families of 
participating t-norm and t-conorm, simultaneously with w and ø. With start 
with S-OWA, and assume a suitable family of t-conorms S' has been chosen, 
e.g., Yager t-conorms sY parameterized with p. We will rely on efficient so- 
lution to problem with a fixed S (i.e., fixed p). We set up a bi-level 
optimization problem 
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Minimize pejo,o] [Fi(p)] , 


where F\(p) denotes solution to (4.25). 

The outer problem is nonlinear, possibly global optimization problem, but 
because it has only one variable, its solution is relatively simple. We recom- 
mend Pijavski-Shubert deterministic method (Appendix[A.5.5). Identification 
of T is performed analogously. 

Next consider fitting ST-OWA functions. Here we have three parameters: 
the two parameters of the participating t-norm and t-conorm, which we will 
denote by pi,p2, and ø as in Problem (4.26). Of course, T and S may be 
chosen as dual to each other, in which case we have to fit only one parameter 
p = pı = p2. To use the special structure of the problem with respect to w 
we again set up a bi-level optimization problem analogously to (4.28). 


Minimize ¢¢\0,1),p1,p2>0 |F (0, p1, P2)I » (4.29) 


where F(a, p1, p2) is the solution to QP problem 


K n 2 
Minimize 5° (È wiST (Xk, 0, p1, p2) — w) (4.30) 
k=1 \i=1 
s.t. Yow; = 1, w; > 0, 
i=1 


orness(w) =o, 
and 


ST(x, ©, P1, p2) `= (1 = o)T* (2), Sang Za) + oS (xq; gane T(n)): 

Solution to the outer problem is complicated because of the possibility 
of numerous local minima. One has to rely on methods of global optimiza- 
tion. One suitable deterministic method is the Cutting Angle Method (CAM) 
described in Appendix [A.5.5] 


Least absolute deviation problem 


Fitting ST-OWA to the data can also be performed by using the Least Abso- 
lute Deviation (LAD) criterion [B5], by replacing the sum of squares in 
and with the sum of absolute values. As described in Appendix 
such a problem is converted to linear programming, which replaces QP prob- 
lems (£25) - (£27), (4-30). The outer nonlinear optimization problems do not 
change. 
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Choice and Construction 
of Aggregation Functions 


5.1 Problem formalization 


In the previous Chapters we have studied many different families of aggrega- 
tion functions, that have been classified into four major classes: conjunctive, 
disjunctive, averaging and mixed aggregation functions. For applications it is 
very important to choose the right aggregation function, or maybe a number 
of them, to properly model the desired behavior of the system. There is no 
single solution, in each application the aggregation functions are different. 

The usual approach is to choose a sufficiently flexible family of aggregation 
functions, and then adapt it to some sort of empirical data, be it observed or 
desired data. It is also important to satisfy application specific requirements, 
such as having a neutral element, symmetry or idempotency. 

In Chapters] when discussing different classes of aggregation functions, 
we considered specific methods of fitting a given class or family of functions to 
empirical data. There are many similarities in the methods we considered, but 
also important differences. Now we present a unifying approach to selecting 
an aggregation function based on empirical data. 

We reiterate that typically the data comes in pairs (x, y), where x € [0, 1]” 
is the input vector and y € [0, 1] is the desired output. There are several pairs, 
which will be denoted by a subscript k: (Xk, Yk), k = 1,..., K. The data may 
be the result of a mental experiment: if we take the input values (21, £2, £3), 
what output do we expect? The data can be observed in a controlled ex- 
periment: the developer of an application could ask the domain experts to 
provide their opinion on the desired outputs for selected inputs. The data can 
also be collected in another sort of experiment, by asking a group of lay peo- 
ple or experts about their input and output values, but without associating 
these values with some aggregation rule. Finally, the data can be collected 
automatically by observing the responses of subjects to various stimuli. For 
example, by presenting a user of a computer system with some information 
and recording their actions or decisions. 
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Typical characteristics of the data set are: a) some components of vectors 
Xx may be missing; b) vectors x; may have varying dimension by construction; 
c) the outputs yz, could be specified as a range of values (i.e., the interval 
[Y> Tk]); d) the data may contain random noise; e) the abscissae of the data 
are scattered. 

We now formalize the selection problem. 


Problem 5.1 (Selection of an aggregation function). Let us have a num- 
ber of mathematical properties P1, P2,... and the data D = {(xx, yx) }44. 
Choose an aggregation function f consistent with P1, P2,..., and satisfying 
f (Xx) yp, k= 1,...,K. 


5.2 Fitting empirical data 


In this section we will discuss different interpretations of the requirement 
f (xk) ~ ye, k = 1,...,K, which will lead to different types of regression 
problems. 

The most common interpretation of the approximate equalities in Problem 
Blis in the least squares sense, which means that the goal is to minimize the 
sum of squares of the differences between the predicted and observed values. 
By using the notion of residuals rg = f(X) — yx, the regression problem 
becomes 


minimize ||r||3 (5.1) 


subject to f satisfying Pj ,P2,..., 


where ||r||2 is the Euclidean norm of the vector of residuals E 
Another useful interpretation is in the least absolute deviation sense, in 
which the ||- ||ı norm of the residuals is minimized, i.e., 


minimize ||r||1 (5.2) 


subject to f satisfying P1, P2,.... 


Of course, it is possible to choose any p-norm, Chebyshev norm, or any 
other common criterion, such as Maximal Likelihood (ML). Also, if the data 
have different degrees of importance, or accuracy, then weighted analogues of 
the mentioned criteria are used. 

A different interpretation of the approximate equality conditions was sug- 
gested in [133]. It was asserted that when empirical data comes from human 
judgements, the actual numerical values of yọ are not as important as the 
order in which the outputs are ranked. Thus one expects that if yz < yı, then 
it should be f(x) < f(xz) for all l, k. Indeed, people are better at ranking the 
alternatives than at assigning consistent numerical values. It is also a common 


1 See p. 22] for definitions of various norms. 
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practice to ask people to rank the alternatives in order of decreasing prefer- 
ences (for example, during elections). Thus Problem[5.I]can be viewed as that 
of satisfying linear constraints f(x.) < f(xz) if yx < yı for all pairs k,l. 

It is also conceivable to consider a mixture of both approaches: to fit the 
numerical data and to preserve the ranking. This is a two-criteria optimization 
problem, and to develop a method of solution, we will aggregate both criteria 
in some way. 


5.3 General problem formulation 
Many types of aggregation functions can be represented in a generic form 

y = f(x; w), 
where w is a vector of parameters. For example, w is a vector of weights for 
weighted means and OWA functions, or w = (A) is a single parameter of a 
parametric family of t-norms or power means, or w = (Ar, Ag, œ) is a vector of 
parameters of a uninorm (p. PII), or w = c is the vector of spline coefficients 
(p.[[73). Moreover, in many interesting cases f depends on w linearly, and we 


will make use of it in the next Section. The properties P1, P2,... are expressed 
through the conditions on w. 


Least squares fitting 


The generic optimization problem is 
K 
Minimize Sila Ww) — Yk)’, 
k=1 
subject to (s.t.) conditions on w. 


Least absolute deviation fitting 
The generic optimization problem is 


K 
Minimize 5 |f(xk;w) — ykl, 
k=1 


subject to conditions on w. 
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Preservation of output rankings 


To preserve the ranking of the outputs, we re-order the data in such a way 
that yı < yo < ... < yx (this can always be done by sorting). The condition 
for order preservation takes the form 


(xe; w) < f(Xk+1; w), for all k =1,..., K — 1. (5.3) 


Then condition (5.3) is added to the least squares (LS) and the least 
absolute deviation (LAD) problem in the form of constraints. 


Fitting data and output rankings 


Since empirical data have an associated noise, it may be impossible to satisfy 
all the constraints on w by using a specified class of aggregation functions. 
The system of constraints is said to be inconsistent. 

Consider a revised version of the least squares problem 


K 
Minimize E (F(Xk, w) — yk)? + (5.4) 
k=1 
R= 
P > max{ f (xp, w) ag f(Xk41,W), 0}, 
k=1 
s.t. other conditions on w. 


Here P is the penalty parameter: for small values of P we emphasize fitting 
the numerical data, while for large values of P we emphasize preservation of 
ordering. Of course, the second sum may not be zero at the optimum, which 
indicates inconsistency of constraints. 


5.4 Special cases 


In their general form, problems in the previous section do not have a special 
structure, and their solution involves a difficult nonlinear global optimization 
problem. While there are a number of generic tools to deal with such prob- 
lems, they are only useful for small numbers of parameters. In this section we 
formulate the mentioned problems in the special case when f depends on the 
vector of parameters w linearly, which suits a large class of aggregation func- 
tions. In this case, the optimization problems do have a special structure: they 
become either quadratic or linear programming problems, and the solution is 
performed by very efficient numerical methods. 
Thus we assume that 


flow) =< g(x), w >= J- wisi(x), 


g being a vector of some basis functions. 
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Example 5.2 (Weighted arithmetic means (SectionQ) ). In this case w is the 
weighting vector and 
Gi(X) = Ti 


The usual constraints are w; > 0, S77, wi = 1. 


Example 5.3 (Ordered Weighted Averaging (Section [2.3))). In this case w is 
the weighting vector and 

gi(X) = Ly, 
where x) denotes the i-th largest component of x. The usual constraints are 
wi > 0, X; wi = 1. In addition, a frequent requirement is a fixed orness 
measure, 

<a,w >=a, 


where a; = %, and 0 < a < 1 is specified by the user. 


n—1? 


Example 5.4 (Choquet integrals (Section [2.6)). We remind that a fuzzy mea- 
sure is a monotonic set function v : 2M — [0,1] which satisfies v(Ø) = 
0, v(N) = 1. The discrete Choquet integral is 


n 


Cx) = X [ro — 2G-1)] oh), (5.5) 


i=1 


where z(o) = 0 by convention, and H; = {(7),...,(n)} is the subset of indices 
of n— i + 1 largest components of x. A fuzzy measure has 2” values, two of 
which are fixed v(0) = 0, v(N) = 1. 

As it was discussed in Section we represent Choquet integral as a 
scalar product < g(x),v >, where v € [0,1]?” is the vector of values of the 
fuzzy measure. It is convenient to use the index 7 = 0,...,2” —1 whose binary 
representation corresponds to the characteristic vector of the set J C N, 
c € {0,1}” defined by cn-i+1 = 1 if i € J and 0 otherwise. For example, let 
n = 5; for j = 101 (binary), c = (0,0,1,0,1) and v; = v({1,3}). We shall use 
letters K, J, etc., to denote subsets that correspond to indices k, j, etc. 

Define the basis functions gj, j = 0,...,2" — 1 as 


(x) = max(0, min z; — max 2; 
g;(x) = max(0, min a; pae tu 


where J C N is the subset whose characteristic vector corresponds to the 
binary representation of j. Then 


C(x) =< g(x), v >. 
Example 5.5 (Linear T-S functions (Section [f-5)). 
f(x;w) = wiT(x) + w2S(x) =< g(x),w >, 


8= (78); wı + w2 = 1, w1, we > 0. 
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Least squares fitting 


K 
Minimize >> (< g(x), W > —yx)?, (5.6) 
k=1 


s.t. linear conditions on w. 


Note that this is a standard quadratic programming problem due to the 
positive semidefinite quadratic objective function and linear constraints. 


Least absolute deviation fitting 


By using the auxiliary variables rf{,r, = 0, such that 
rE +r =| < g(x), w > —yx|, and rf —r, =< g(xk),w > —Yk, we 
convert the LAD problem into a linear programming problem 


K 
Minimize Drif ++r,, (5.7) 
k=1 


s.t. < g(xk), wW > =r} +r} = Yk, 
k=1,...,K, 
rir; >0,k=1,...,K, 


other linear conditions on w. 


Least squares with preservation of output rankings 


K 
Minimize E (< gxr), w > —yk)?, (5.8) 
k=1 


s.t. < g8(xXk+1), W > — < g(x), w >> 0, 
pce 


other linear conditions on w. 


This is also a standard quadratic programming problem which differs from 
(&.6) only by K — 1 linear constraints. 


5.4 Special cases 267 


Least absolute deviation with preservation of output rankings 


Minimize 5 re +r, (5.9) 
k=1 
s.t. < g(x), w > =r} +r} = Yk, 
kadaka 
< g8(xk+1) — (Xx),w >> 0, 
k=1,...,k-1, 
Hot, SU kelek, 
other linear conditions on w. 


This is also a standard linear programming problem which differs from 
(5.7) only by K — 1 linear constraints. 


Fitting data and output rankings 


(< g(xx),w > —yk)? + (5.10) 


Mr 


Minimize 


k=1 


K-1 
P 5, max{< g(xx) — 8(xk+1), w >, Of, 
k=1 
s.t. other linear conditions on w. 


This is no longer a quadratic programming problem. It is a nonsmooth 


but convex optimization problem, and there are efficient numerical methods 
for its solution ig, 1139), fod 


However, for the LAD criterion, we can preserve the structure of the LP, 
namely we convert Problem (5.9) into an LP problem using auxiliary variables 
qk 


K-1 


K 
Minimize Drt +P DO a (5.11) 
k=1 k=1 
st. < g(xk), w > =r} +r} = Yk, 


kalnas K, 
det < 8(Xk+1) — B(Xk), w >> 0, 
k= Lac eL 
Gin TET > 0. Rad wel, 


other linear conditions on w. 
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5.4.1 Linearization of outputs 


It is possible to treat weighted quasi-arithmetic means, generalized OWA, 
generalized Choquet integrals and t-norms and t—conorms in the same frame- 
work, by linearizing the outputs. 

Let h : [0,1] — [—00, co], be a given continuous strictly monotone function. 
A weighted quasi-arithmetic mean (see Section 2.3) is the function 


f(x; w) = aod w;h(ai)). 


By applying h to the outputs yx we get a quadratic programming problem 
which incorporates preservation of output rankings 
K 


Minimize 2 (< h(xk), w > —h(yx))?, (5.12) 


s.t. < h(xk+1) = h(x), w) >> 0, 
k=1,...,K—-1, 
> w; = 1,1; > 0, 


where h(x) = (h(a1),...,h(@p)). In the very same way we treat generalized 
OWA and generalized Choquet integrals. 

For t-norms and t-conorms, we shall use the method based on fitting 
their additive generators, discussed in Section B.4.15] We write the additive 
generator in the form of regression spline 


S(t) =< B(t),c >, 


where B(t) is a vector of B-splines, and c is the vector of spline coefficients. 
The conditions of monotonicity of S are imposed through linear restrictions 
on spline coefficients, and the additional conditions S(1) = 0, S(0.5) = 1 also 
translate into linear equality constraints. 

After applying S to the output yp we obtain the least squares problem 


K n 
Minimize $ (X < B(zkn),c > — < Blyx),¢ >)? (5.13) 
k=1 i=1 


s.t. linear restrictions on c. 


By rearranging the terms of the sum we get a QP problem 


K n 
Minimize X> (< | >> B(£kn) — B(yx)| ,¢ >)? (5.14) 
k=1 i=1 
s.t. linear restrictions on c. 


Next we add preservation of outputs ordering conditions, to obtain the 
following QP problem (note that the sign of inequality has changed because 
S is decreasing) 


5.5 Assessment of suitability 269 


Minimize 25 [È Bem) — B) es)? (5.15) 
st < [È Bierna) — Ù Blain) ,6>< 0, 


k=1,...,.K-1, 


linear restrictions on c. 


LAD criterion results similarly in an LP problem. 

We see that various classes of aggregation functions allow a representa- 
tion in which they depend on the vector of parameters linearly, possibly after 
linearization. The problems of least squares, least absolute deviation, preserva- 
tion of output ordering and their combinations can all be treated in the same 
framework, and reduced to either standard quadratic or linear programming, 
which greatly facilitates their solution. 


5.5 Assessment of suitability 


When fitting aggregation functions to numerical data, it is also important to 
exclude various biases, due to the chosen class or family of functions. These 
biases can be due to inappropriate choice of family, or inconsistency of its 
properties with the data. We examine two useful tools which can be used for 
a posteriori analysis of the constructed aggregation function. 

The first method is subjective. It consists in plotting the predicted values 
of the outputs against their observed values, as on Fig. If the estimate 
f is unbiased, then the dots should lie along the diagonal of the square, and 
on the diagonal for the data that are interpolated. How far the dots are from 
the diagonal reflects the accuracy of approximation. However, ideally the dots 
should be spread evenly above and below the diagonal. If this is not the case, 
say the dots are grouped below the diagonal, this is a sign of systematic 
underestimation of yx, i.e., a bias. Similarly, dots above the diagonal mean 
systematic overestimation. 

For example, let the data be consistent with the averaging type aggrega- 
tion, and let us use a conjunctive aggregation function, like a t-norm, to fit it. 
Clearly, the predicted output values are bounded from above by the max func- 
tion, and any t-norm will systematically underestimate the outputs. Based on 
this plot, the user can visually detect whether a chosen class of aggregation 
functions is really suitable for given data. 

It is also possible to label the dots to see in which specific regions the 
constructed aggregation function exhibits bias (e.g., only for small or only for 
large values of yx). 

The second method is quantitative, it is based on statistical tests [286]. 
The simplest such test is to compute the correlation coefficients between the 
predicted and observed values. For example, if correlation is higher than 0.95, 
then the aggregation function models the data well. 
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predicted 
predicted 








| Ln anarenes m — 


observed observed 


Fig. 5.1. Plots of predicted versus observed values. The plot on the left shows a 
bias (systematic underestimation), the plot on the right shows no bias. 


The second test computes the probability that the mean difference of the 
predicted and observed values is not different from zero. Specifically, this test, 
known as one population mean test, computes the mean of the differences of 
the predicted and observed values mg = + Sil f (Xk) — yk) of the sample, 
and compares it to 0. Of course, mx is a random variable, which should be 
normally distributed around the mean of the population. The null hypothesis 
is that the true mean (of the population) is indeed zero, and the observed 
value of mx (of the sample) is different from zero due to finite sample size. 
The alternative hypothesis is that the mean of the population is not zero, and 
there is a systematic bias. The Student’s (two-tailed) t-test can be used. It 
requires the mean of the sample, the observed standard deviation s and the 
sample size K. The statistical test provides the probability (the P-value) of 
transition, i.e., the probability of observing the value of test statistic at least 
as large as the value actually observed (i.e., JIR) if the null hypothesis were 
true. Small P-values indicate that the null hypothesis (the mean difference of 
the population is zero) is very unlikely. 


Example 5.6. Experimental data y were compared to the values of the 
min aggregation function. K = 20 is the sample size, and the computed mean 
of the sample was mg = 0.052 with the standard deviation s = 0.067 (the data 
are plotted on Fig. B.I] (left)). The test statistic is t = 3.471, with df = 19 
(degrees of freedom), the P-value was less than 0.01. We deduce that the 
observed value of t will occur with probability less than 0.01 if the actual mean 
of the population were 0. Hence we reject the null hypothesis at 1% significance 
level, and conclude that min does not provide an unbiased estimate of the 
aggregation function used by the subjects of the experiment. 


6 


Interpolatory Type Aggregation Functions 


6.1 Semantics 


Many classes of aggregation functions we have considered so far provide a 
large arsenal of tools to be used in specific applications. The parameters of 
various families and the vectors of weights can be adjusted to fit numerical 
data. Yet for certain applications these classes are not flexible enough to be 
fully consistent with the desired inputs-outputs. Sometimes the problem spec- 
ification is not sufficient to make a call as to what type of aggregation function 
must be used. 

In this Chapter we consider an alternative construction, which is based 
almost entirely on the empirical data, but also incorporates important appli- 
cation specific properties if required. The resulting aggregation function does 
not belong to any specific family: it is a general aggregation function, which, 
according to Definition [L5jis simply a monotone non-decreasing function sat- 
isfying f(0) = 0 and f(1) = 1. Furthermore, this function is not given by 
a simple algebraic formula, but typically as an algorithm. Yet it is an ag- 
gregation function, and for computational purposes, which is the main use 
of aggregation functions, it is as good as an algebraic formula in terms of 
efficiency. 

Of course, having such a “black-box” function is not as transparent to the 
user of a system, nor is it easy to replicate calculations with pen and paper. 
Still many such black-box functions are routinely used in practice, for example 
neural networks for pattern recognition. 

What is important, though, is that the outputs of such an aggregation 
function are always consistent with the data and the required properties. 
This is not easy to achieve using standard off-the-shelf tools, such as neural 
network libraries. The issue here is that no application specific properties are 
incorporated into the construction process, and as a consequence, the solution 
may fail to satisfy them (even if the data does). For example, the function 
determined by a neural network may fail to be monotone, hence it is not an 
aggregation function. 
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In this Chapter we present a number of tools that do incorporate mono- 
tonicity and other conditions. They are based on methods of monotone multi- 
variate interpolation and approximation, which are outlined in the Appendix 
The resulting functions are of interpolatory type: they are based on inter- 


polation of empirical data. Interpolatory constructions are very recent, they 
have been studied in (15,010 Hol bal ad laa 64 hog Lad iad G9A. 

The construction is formulated as the following problem. Given a set (pos- 
sibly uncountable) of values of an aggregation function f, construct f, subject 
to a number of properties, which we discuss later in this Chapter. The point- 
wise construction method results in an algorithm whose output is a value of 
the aggregation function f at a given point x € [0,1]”. 

Continuous and Lipschitz-continuous aggregation functions are of our par- 
ticular interest. Lipschitz aggregation functions are very important for applica- 
tions, because small errors in the inputs do not drastically affect the behavior 
of the system. The concept of p-stable aggregation functions was proposed in 

fas). These are precisely Lipschitz continuous aggregation functions whose 
Lipschitz constant M in ||- ||, norm is one El 

The key parts of our approach are monotone interpolation techniques. We 
consider two methods: a method based on tensor-product monotone splines 
and a method based on Lipschitz interpolation. The latter method is specif- 
ically suitable for p-stable aggregation functions, and it also delivers the 
strongest, the weakest and the optimal aggregation functions with specified 
conditions. 

We now proceed with the mathematical formulation of the problem. Given 
a data set D = { (xk, Yk)} f1; Xx € [0, 1]”, yx € [0, 1], and a number of proper- 
ties outlined below, construct an aggregation function f, such that f(x) = Yk 
and all the properties are satisfied. The mentioned properties of an aggrega- 
tion function define a class of functions F, typically consisting of more than 
just one function. Our goal is to ensure that f € F, and if possible, that f is 
in some sense the “best” element of F. 


6.2 Construction based on spline functions 


6.2.1 Problem formulation 


Monotone tensor product splines are defined as 


Jı J2 Jn 


i Gy = YO YO o Y Osa Blan) Bjn (€n). (6-1) 


Ji=1j2=1 jn=1 


The univariate basis functions are chosen to be linear combinations of 
standard B-splines, as in Bd, which ensures that the conditions of 


1 See p. 22] for definitions of various norms. 
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monotonicity of f are expressed as linear conditions on spline coefficients 
Chr ja. Jn° 

The computation of spline coefficients (there are Jı x Jg x... X Jn of 
them, where J; is the number of basis functions in respect to each variable) 
is performed by solving a quadratic programming problem 


2 
K Jı 


Jn 
minimize 5 5 gaa 5 Cjija...ja Bj (Lik) --- By, (@nk) — yk | , (6.2) 
k=1 (A1 jn=1 


subject to 
Jı 


Jn 
sani X i Cjija..jn 2 0, 
jı=1 jn=1 


and 
Jı 


Jn 
JORD T eE PO) Be) =, 
j=l Jn=1 
Ji Ja 
FNS Y Chji Back =i: 
jı=1 Jn=1 
This problem involves very sparse matrices. For solving QP problems with 
sparse matrices we recommend OOQP sparse solver (05), see Appendix[A.5.2] 
In practice it is sufficient to use linear splines with 3-5 basis functions 
(J; = 3,4,5), which gives good quality approximation for 2-5 variables. For 
more variables the method becomes impractical because the number of spline 
coefficients (and hence the sizes of all matrices) grows exponentially with n. 
On one hand it requires a large number of data, which is usually not available, 
on the other hand the required computing time also becomes too large. 


Example 6.1. Consider a real data set D of 22 input-output pairs from : 
and its approximation with a bivariate tensor-product monotone spline J; = 
Jz = 4. The resulting aggregation function is plotted on Fig. [6.1] 


Note 6.2. Other tensor product monotone interpolation methods B! led, [63 can be 
applied to aggregation functions, although in most cases these methods are limited 
to two variables. There are also alternative methods for approximation of scattered 
data based on triangulations iid, Bst, in these methods the basis functions are 
determined by the data. However preservation of monotonicity becomes rather com- 
plicated, and the available methods are only suitable for bivariate case. 


6.2.2 Preservation of specific properties 


It is important to incorporate other problem specific information into the con- 
struction of aggregation functions. Such information may be given in terms 
of boundary conditions, conjunctive, disjunctive or averaging behavior, sym- 
metry and so on. In this section we describe the method from (15) Ld, which 
accommodates these conditions for tensor product monotone splines. 
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Fig. 6.1. Tensor-product spline approximation of the data from . The data are 
marked with circles. 


Symmetry 


There are two methods of imposing symmetry with tensor-product splines. 
Consider the simplex S = {x € [0,1]"|v1 > x2 >... > £n} and a function 
f: S — [0,1]. We recall that the function f : [0,1]” — [0,1] defined by f(x) = 
f(x\,) is symmetric (x\, is obtained from x by arranging its components in 
non-increasing order). Then in order to construct a symmetric f, it is sufficient 
to construct f. 

The first method, which we call implicit, consists in constructing the tensor 
product splines on the whole domain, but based on the augmented data set D, 
in which each pair (xz, yx) is replicated n! times, by using the same yp and all 
possible permutations of the components of x;. Construct monotone spline by 
solving QP problem (6.2). The resulting function f will be symmetric. Note 
that the symmetry of f is equivalent to the symmetry of the array of spline 
coefficients. 

If the number of data becomes too large, it is possible to use the data on 
the simplex S$ only (i.e., the data set D in which each x, is replaced with 
Xx k. In this case, the spline f will approximate the data only on S$, and to 
calculate f(x) elsewhere one uses f(x) = FON). 

The second method is explicit, it consists in reducing the number of basis 
functions by the factor of n! and constructing the spline f on the simplex S 
using the data set D in which each Xp is replaced with xx. The definition of 
the spline (6.1) is modified to include only the products of the basis functions 
with the support on S. The problem (6.2) is also modified to make use of the 
symmetry of the array of spline coefficients. 
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Neutral element 


We recall the definition of the neutral element (p.[[2), which implies that for 
every t € [0,1] in any position it holds 


fl€jcei4@)t, Cys a, €) =F. (6.3) 


Let us use the notation e(t, i) = (e,...,e,t,e,...,e) for some e € [0,1] with t 
in the i-th position. 

It is shown that for linear tensor product spline, it is sufficient to use 
the interpolation conditions f(e(t;,i)) = t; for j =1,..., Ji, andi=1,...,n, 
where t; denote the knots of the spline. The conditions are imposed for all 
the variables. These conditions are incorporated easily into the problem 
as linear equalities. 


Idempotency 


Condition of idempotency f(t,...,t) = t is equivalent to the averaging behav- 
ior of the aggregation function. For tensor-product splines this condition can 
be enforced by using a number of interpolation conditions f(t,,t;,...,t;) = 
tj,j = 1,...,M, but now tj are not the knots of the splines. The values of 
tj can be chosen with relative freedom with M > n + 1, in such a way as 
to match all n-dimensional rectangles (formed by tensor products of intervals 
between consecutive spline knots) which intersect the diagonal, see IEI (17). 
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Fig. 6.2. Tensor-product spline approximation of empirical data marked with cir- 
cles. The conditions of symmetry and the neutral element e = 0 were imposed. The 
data f(0,t;) = f(t;,0) = tj, j =1,...,5 are marked with large filled circles. 
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6.3 Construction based on Lipschitz interpolation 


The method of monotone Lipschitz interpolation was proposed in and 
applied to aggregation functions in m] ba RA. Denote by Mon the set of 
monotone non-decreasing functions on [0, 1]”. Then the set of general Lipschitz 
n-ary aggregation functions with Lipschitz constant M is characterized as 


Am, = {fF € Lip(M, ||- ||) N Mon : f(0) = 0, f(1) = 1}. 


We assume that the data set is consistent with the class Am, jj}. If not, 
there are ways of smoothing the data, discussed in[21}. Our goal is to determine 
the best element of Am). which interpolates the data. The best is understood 
in the sense of optimal interpolation bag: it is the function which minimizes 
the worst case error, i.e., solves the following Problem 

Optimal interpolation problem 


min max max x) — q(x 
Cae ae ) — g(x) 


s.t. f (Xk) =yp,k= i Peewee A 


The solution to this problem will be an aggregation function f which is the 
“center” of the set of all possible aggregation functions in this class consistent 
with the data. 

The method of computing f is based on the following result (21). 


Theorem 6.3. Let D be a data set compatible with the conditions f € 
Lip(M,||-||) OQ Mon. Then for any x € [0,1]", the values f(x) are bounded by 
o1(x) < f(x) < ou(x), with 


u(x) = min{ yx + M|- xx) +l} 
a(x) = max{yk — M||(Xx = X)+||f, (6.4) 
where z4 denotes the positive part of vector z: 24 = (Z1,.--,%n), with 
Zi = max{z;,0}. 


The optimal interpolant is given by 


f) = 5 (u(x) + owl). (6.5) 


Computation of the function f is straightforward, it requires computation 
of both bounds, and all the functions, o1, ou and f belong to Lip(M, ||- ||). 
Mon. Thus, in addition to the optimal function f, one obtains as a by-product 
the strongest and the weakest aggregation functions from the mentioned class. 

It is also useful to consider infinite data sets 
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D = {(t, v(t)):t € QC [0,1])",v: 2 = [0, 1]]} 
in which case the bounds translate into 
By(x) = inf {v(t) + MII — 6) Il} 


By(xe) = sup v(t) — M(t — x)+ 11}. (6.6) 





We will make use of these bounds when considering special properties of ag- 
gregation functions, such as idempotency or neutral element. 

The function f given in Theorem is not yet an aggregation function, 
because we did not take into account the conditions f(0) = 0,f(1) = 1. 
By adding these conditions, we obtain the following generic construction of 
Lipschitz aggregation functions 


f(x) = 5(A(x) +A). (6.7) 


A(x) = max{o;(x), B)(x)}, A(x) = min{o.(x), Bu(x)}), (6.8) 


where the additional bounds B; and B,, are due to specific properties of ag- 
gregation functions, considered in the next section. At the very least we have 
(because of f(0) = 0, f(1) = 1) 


By (x) = min{M||x||, 1}, (6.9) 
By(x) = max{0, 1 — M||1 — x||}, 


but other conditions will tighten these bounds. Figure [6.3] illustrates the 
method of Lipschitz interpolation on the empirical data from lsh. 





Fig. 6.3. 3D plot of an optimal Lipschitz interpolatory aggregation function based 
on the data from . The data are marked with circles. 
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We note that as a special case of Equations (€.4)-(69) we obtain p-stable 
aggregation functions (Definition [1.60], p. 22), which have Lipschitz constant 
M = 1 in the norm ||- ||p. In this case the bounds become Yager t-norm 
and t-conorm respectively (see p.[156), Bu = a B= Ti 


Preservation of symmetry 


Symmetry can be imposed in a straightforward manner by ordering the in- 
puts, as discussed in Section Namely, consider the simplex S = {x € 
(0,1]"|v1 > vg > ... > £n} and a function f : S — [0,1]. The function 
f : [0,1]” — [0,1] defined by f(x) = f(x\,) is symmetric (x\, is obtained 
from x by arranging its components in non-increasing order). Then in order 
to construct a symmetric f, it is sufficient to construct i. 

To build f we simply apply Eq. (6.8), with the bounds ou, 0; modified as 


u(x) = min{ Yi + M||(« — x\«)4+II}, 
alx) = maxt{ Yk — M\|(\r = x)+I[}, 


i.e., we order the abscissae of each datum in non-increasing order. There is 
no need to modify any of the subsequent formulae for Bu, Bı, as long as 
the conditions which define these bounds are consistent with the symmetry 
themselves (Bu, Bı will be automatically symmetric). 


6.4 Preservation of specific properties 


In this section we develop tight upper and lower bounds, B; and B,, on various 
Lipschitz aggregation functions with specific properties. These bounds apply 
irrespectively of which interpolation or approximation method is used, and 
must be taken into account by any construction algorithm. We develop such 
bounds by using (6.6) with different choices of the subset 2, corresponding 
to the indicated properties. 

We already know that the bounds given in (6.9) apply universally to any 
Lipschitz aggregation function with the Lipschitz constant M. However the 
set of general Lipschitz aggregation functions Am |||] is very broad, and as a 
consequence, the bounds in (6.9) define a wide range of possible values of an 
aggregation function. Often in applications there are specific properties that 
must be taken into account, and these properties may substantially reduce the 
set of functions Ajj,\\.);, and consequently produce tighter bounds. In this sec- 
tion we examine various generic properties, and show how the corresponding 
bounds B, and B, are computed. 


6.4.1 Conjunctive, disjunctive and averaging behavior 


We have the following restrictions on {0, 1]”, see p] 
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e Conjunctive behavior implies f < min. 
e Disjunctive behavior implies f > max. 
e Averaging behavior (or idempotency) implies min < f < max. 


These bounds immediately translate into the following functions Bu, Bı in 
(6.8) 


e Conjunctive aggregation (M > 1) 

Bu (x) = min{M]||x||,min(x)}, Bı(x) = max{0,1— M||1 — x||}. 
e Disjunctive aggregation (M > 1) 

Bu (x) = min{ M||x||, 1}, B(x) = max{1 — M||1 — x||, max(x)}. 
e Averaging aggregation 


B(x) = min{ M||x||,max(x)}, B(x) = max{1 — M||1 — x||, min(x)}. 


6.4.2 Absorbing element 


The existence of an absorbing element a € [0,1] does not imply conjunctive or 
disjunctive behavior on any part of the domain, but together with monotonic- 
ity, it implies f(x) = a on [a, 1] x [0, a] and [0, a] x [a, 1] (and their multivariate 
extensions). 
Such restrictions are easily incorporated into the bounds by using 
Bi(x) = max Bi(x), Bu(x) = min B? (x), (6.10) 
Bi(x) =a- M(a—2i)4, 
Bi(x) = a + M (z; — a)4. 


Note 6.4. Construction of aggregation functions with given absorbent tuples, a gen- 
eralization of the absorbing element, has been developed in Pa. 


6.4.3 Neutral element 


The existence of a neutral element is a stronger condition than conjunc- 
tive/disjunctive behavior, and consequently the bounds on the values of f are 
tighter. We shall see that calculation of these bounds depends on the Lipschitz 
constant and the norm used, and frequently requires solving an optimization 
problem. 

We recall the definition of the neutral element e € [0,1] (p. 12), which 
implies that for every t € [0,1] in any position it holds 


2 For conjunctive and disjunctive aggregation functions M necessarily satisfies 
M > 1, since at the very least, for conjunctive functions M > f(1,...,1)— 
f(0,1,...,1)= 1 — 0 = 1, and similarly for disjunctive functions. 
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Feugait Eege) = bs (6.11) 


Let us use the notation e(t, i) = (e,...,e,t,e,...,e) with tin the i-th position. 


? ? 


The bounds implied by the condition (6-11) are 
By (x) = min Bi (x), (6.12) 


=1,..., 


PE 


where for a fixed i the bounds are (from (6.6)) 
B(x) = min (t+ M||(x — e(t, i))+l) 
teE[0,1] 


Bi(x) = max (t — MI|(e(t,i) = x) ll) (6.13) 


We need to translate these bounds into practically computable values, 
for which we need to find the minimum/maximum with respect to t. Since 
any norm is a convex function of its arguments, the expression we minimize 
(maximize) is also convex (concave), and hence the minimum (maximum) is 
unique. The following proposition establishes these optima H 


Proposition 6.5. Given e € [0,1], x € [0,1]”, i € {1,...,n}, M > 1, p >1, 
and a norm ||- ||p, let 

Fx,e(t)=t+M]||((x1—e)4,.--,(@i-1-€) +, (zi—t)+, (Wi41-) 4, +--+ (€n—€)+)llp 
The minimum of fx, is achieved at 


e #=0,ifM=1; 
e t=2;, ifp=1andM >1; 
1 


e t= Mad 0. = (y 


xi p otherwise 
MrT 1 $ ; ? 


and its value is 
M(c(i) + 2?)?, if t* =0, 

min fxelt) = ¢ r; + (MP1 — 1)? e(i)?, ift = 2; — (y , 

zi + Me(i)?, if t* 





Ti; 

(6.14) 
where c(i) = X ja; (£j — e)4. 
Corollary 6.6. The upper bound on an aggregation function with neutral el- 
ement e € [0,1] and Lipschitz constant M takes the value 


where Bi (x) = fx, e(t*) is given by [6-T4) and ou is given by (62). 
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Proposition 6.7. Given e € [0,1], x € [0,1]", i € {1,...,n}, M>1,p>1, 
and a norm ||- ||p , let 


M E E eee Core Cee ne Cm ey eee E 
The maximum of gx e(t) is achieved at 


e #=1,ifM=1; 
e t}=2;, ifp=1 and M> l1, or 


1 
e t= Mat asa + (a) a) otherwise, 
Mp?-1—1 


and its value is 
zı — Mč(i)?, if Č = ti, 
= -2 pole, vl. a(i) 
Max Jx,e(t) = Ta = (M77 = 1) P C(i)?, if T= zi + | = ; 
A T 
1— M(E + (1—2;)”)?, if t* 


II 


(6.15) 
where č(i) = X jzile — z)h. 
Corollary 6.8. The lower bound on an aggregation function with neutral el- 
ement e € [0,1] and Lipschitz constant M takes the value 


A(x) = „max {Bj(),01()} 


where Bi(x) = gx,e(t*) is given by and o, is given by (6-4). 


Note 6.9. Recently construction of aggregation functions with given neutral tuples, 
a generalization of the neutral element, has been developed in 29). 


6.4.4 Mixed behavior 


In this section we develop bounds specific to the following types of mixed 
aggregation functions, for a given e = (e,e,...,€). 


I. f is conjunctive for x < e and disjunctive for x > e; 
II. f is disjunctive for x < e and conjunctive for x > e; 
Ill. f is disjunctive for x < e and idempotent for x > e; 
IV. f is conjunctive for x < e and idempotent for x > e; 
V. f is idempotent for x < e and disjunctive for x > e; 
VI. f is idempotent for x < e and conjunctive for x > e. 


We take for convenience e = (4,..., $), but we note that e is not the 


neutral element of f (the existence of the neutral element is not necessary). 
We will not require specific behavior for the values of x in the rest of the 
domain R = [0,1]” \ ([0,e]” U [e, 1]"). The reason is that in most cases the 
restrictions on that part of the domain will follow automatically. 

We also note that cases V and VI are symmetric to cases IV and III 
respectively, and the results are obtained by duality. 
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Case I: f is conjunctive for x < e and disjunctive for x > e 


In this case the behavior of the aggregation function is similar to that of a 
uninorm (see Fig [Z-1), but no associativity is required. Yager calls such a 
class of functions generalized uninorms (GenUNI). However our conditions are 
weaker, since we did not require symmetry nor the neutral element e = 1/2. 
If we did require neutral element e, the aggregation function would have the 
bounds given by Corollaries[6.6] and [6.8]in Section [6.4.3] Finally we note that 
the bounds and the optimal aggregation function are Lipschitz continuous (cf. 
uninorms). 

On [0,e]" f is bounded from above by the minimum, and on [e, 1]” it is 
bounded from below by the maximum. This implies M > 1. Examine the 
bounds on the rest of the domain R. Consider the lower bound. The bounds 
on [0,e]” imply a trivial bound 0 < f(x) elsewhere. However, since on [e, 1]” 
f(x) > max(x), this implies 


f(x) > max (max() ~ M||(@ —x)-+I)) 


After some technical calculations we obtain 
> — MV — Wee i 
f(x) = Pe M||(max{0,e — z1},..., (6.16) 
max{0,e — zpk—1}, (t — £k), max{0, e — £k+1},. . ., max{0, e — zn })]]) 


for some e € [0,1], where £k = MaXi=1,...n{ £i}. 
Applying Proposition [6.7] the point of maximum 


e t=1 if M =Í; 
e t =2x, ifp=1and M > 1, or 


P 
e {= Met {ak + (=) a} otherwise, 
M?-1—1 


with K = >7);,, max{0, e — xi}”. Thus the lower bound B;(x) is 


tp —MK?, if t* = 2% 


1 


Bi(x) = m= (MP 1) KE tt + (EE), (6.17) 
Mpr-i—1 


1—M(K +(1-— zp)P)?, if t* =1. 


Similarly, the fact that f is bounded from above by minimum on [0, e]” 


implies the following upper bound on the rest of the domain R 


Fe) < min (min(z) + M||(Ge— z) 


which translates into 
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f(x) < min (t + M||(max{0, zı — e},. (6.18) 
max{0, zj—ı — e}, (x; — t), max{0, £j+1 — e},...,max{0,z, — e})||), 
where xj = minj=i,....n{xi}. By applying Proposition [6.5] the minimizer is 


given by 


e t =0,if M=1; 
e t =r; if p=1 and M >1; 


Mp?r-1—1 


and the upper bound is 


SIH 


T7 | otherwise, 


M(K + (ax;)?)?, if =0, 
i 
B,(x) = Tj T (M7 — 1) "K?, if t* = = Tj — (=) ; (6.19) 
a; +MK>, if i = 25, 





where K = )/,,,max{0, 2; — e}?. 
Summarizing, for a mixed aggregation function with conjunctive/disjunctive 
behavior the bounds are 


0 < f(x) < min(x), if x € [0,e]”, 
max(x) < f(x) <1, if x € |e, 1]”, (6.20) 
Bi(x) < f(x) < Bu(x) elsewhere, 


with Bı, By given by (6-17) and (G19). 
Case II: f is disjunctive for x < e and conjunctive for x > e 


In this case f > max on [0,e]” and f < min on [e, 1]”. We immediately obtain 
that f has the absorbing element a = e, i.e., Vx € [0,1]",2 € {1,...,n}: 
f(x) = e whenever any x; = e. Such a function has a similar structure to 
nullnorms, but needs not be associative. It follows that f(x) = e for vectors 
whose components are not smaller or bigger than e. Thus the bounds are 


max(x) < f(x) < if x € [0,e]”, 
0 < f(x) < an if x € [e,1]”, (6.21) 
f(x) =e, elsewhere. 


Case III: f is disjunctive for x < e and idempotent for x > e 


In this case, f is bounded by the maximum from below for x < e, and is 
bounded by the minimum and maximum for x > e. This implies that e is the 
lower bound for all x € R . At the same time, since f is bounded from above 
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by the maximum for all x > e, it will have the same bound for x € R due to 
monotonicity. Thus the bounds are 


max(x) < f(x) <e, if x € [0,e]”, 
min(x) < f(x) < max(x), if x € [e,1]”, (6.22) 
e < f(x) < max(x), elsewhere. 


Case IV: f is conjunctive for x < e and idempotent for x > e 


In this case we obtain the bounds 


0 < f(x) < min(x), if x € [0,e]”, 
min(x) < f(x) < max(x), if x € [e,1]”, (6.23) 
Bi(x) < f(x) < Bu(x), elsewhere, 


where B,, is given as 


B,(x) = min{B,,(x), max(x)}, 
and B,,(x) is the expression in (6.19). Bı is given as 


Bi(x) = max (t — Mll((t — 21)44--- (n — 2n)4ll) = Mlle — x): 


6.4.5 Given diagonal and opposite diagonal 


Denote by 6(t) = f(t,t,...,¢) the diagonal section of the n-ary aggregation 
function f. If f € Ajg)j.),, then ô € Lip(Mn'/?, || - ||»). Also 6 is non- 
decreasing, and 6(0) = 0,0(1) = 1. We denote by w(t) = f(t,1 — t) the 
opposite diagonal section of a bivariate aggregation function. We note that 
w € Lip(M,||-||). We assume that the functions ĝ,w are given and they have 
the required Lipschitz properties. 


Diagonal section 
From (6.6) it follows that 


Bg) OE) Mea Oh ase = 


B(x) = max (50) — M(t 21) 4s---0(€—@n)4)ID- (624 


For the purposes of computing the values of B,,(x), B;(x) we need to de- 
velop suitable algorithms to solve the optimization problems in nu- 
merically. Since the function ô(t) is fairly arbitrary (we only require ô € 
Lip(Mn*/?, ||-||p) Mon), the overall expression may possess a number of lo- 
cal minima. Calculation of the bounds require the global minimum, and thus 
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we need to use a global optimization technique, see Appendix [A.5.5] We rec- 
ommend using Pijavsky-Shubert method of deterministic global optimization. 

To apply the Pijavsky-Shubert algorithm we need to estimate the Lipschitz 
constant of the objective function. Since 6 € Lip(Mn!/?, || - ||p) and is non- 
decreasing, and the function 


M||(1 = t), (En — t) 


is in Lip(Mn?/”, ||- ||») and is non-increasing (we can prove this with the 
help of the identity ||x||, < n1/?||x||..), the Lipschitz constant of the sum is 
Mn1/?. Hence we use the Pijavsky-Shubert algorithm with this parameter. 
In the special case of bivariate 1-Lipschitz functions (i.e, n = 2,p = 
1, M = 1) we have 
By(x) = max(#1, 72) + min (d(t) — t), 
tela, 6] 
B(x) = min(z1, 72) + max (d(t) — t), 
tela,6] 
where a = min(#1, 72), 8 = max(21, £2). 
For p — oo a similar formula works for any dimension n. We have 


By (x) = ee) + M(max{2i} -t))= M max{a;} + ee) — Mt), 


and 


Bi(x) = ee) —M(t- min{2;})) =M min {xi} + ee — Mt). 


Opposite diagonal 


We consider bivariate aggregation functions with given w(t) = f(t,1—t). The 
bounds are computed as 


By (xx) = min(w(t) + MIle = t), (t= (1 = #2) 01D 
Bi(x) = max(w(t) = MII((E = s1)+, (1 = 22 = t)+)I). (6-25) 


We notice that w € Lip(M, ||- ||) and so is the second term in the expression, 
hence the objective function is in Lip(2M, ||- ||). We apply Pijavski-Shubert 
method with this Lipschitz parameter to calculate the values of the bounds 
for any x. 

As a special case the following bounds were provided for bivariate 1- 
Lipschitz increasing functions 


By (x) = Tr(x) + min (W(t), 


te[a, 3] 
Bi(x) = Sz(x)— 1 + max (w(t)), (6.26) 
te[a,G] 
where a = min{a1,1 — x2}, @ = max{a1,1— x2}, and Tr, Sr denote 


Lukasiewicz t-norm and t-conorm respectively. 
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Example 6.10. Optimal Lipschitz aggregation functions with diagonal sections 
Slt) = t4, dof) = min(1, 15) 

are plotted in Fig. 


Example 6.11. Optimal Lipschitz aggregation functions with opposite diago- 
nal sections 


w(t) = —t?7+¢+0.25, w(t) = min(t(1 — t), 0.2) 
are plotted in Fig. [6.5] 


Example 6.12. An optimal Lipschitz aggregation function with opposite diag- 
onal section 


w(t) = min(t, (1 — t)) 


is plotted in Fig. 





Fig. 6.4. 3D plots of optimal Lipschitz interpolatory aggregation functions with 
given diagonal sections in Example [6.10 


6.4.6 Given marginals 


We consider the problem of obtaining an aggregation function f when certain 
functions are required to be its marginals. For some special cases of 1-Lipschitz 
aggregation functions this problem was treated in (isd, the general case is 
presented below. 
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Fig. 6.5. 3D plots of optimal Lipschitz interpolatory aggregation functions with 
given opposite diagonal sections in Example 





Fig. 6.6. 3D plot of an optimal Lipschitz interpolatory aggregation functions with 
a given opposite diagonal section in Example [6.12] 


Definition 6.13. Let f : [0,1]" — [0,1] be a function. Its restriction to a 
subset N) = {x € [0,1])"| z; = 0,i € T,2; = 1;j € J, INT = Ú} for some 
T,I C {1,...,n} is called a marginal function (a marginal for short f]. 


Geometrically, the domain of a marginal is a facet of the unit cube. If 
T= {1,...,i— 1,i+1,...,n} and J = 9, or vice versa, y : [0,1] — [0,1] is 


3 In probability theory, a marginal probability distribution density is obtained by 
integrating the density p(x) over all the variables x; in Z U J. This is equivalent 
to the restriction of the probability distribution function P(x) to the set 2 in 
Definition [6.13] with Z = Ø, in the special case of the domain of P being [0, 1]”. 
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called the i-th marginal. Evidently, a marginal is a function y : [0,1] — [0,1], 
m=n-—|Z|—|J| <n. 

Consider construction of a bivariate Lipschitz aggregation function f (x1, x2) 
based on a given marginal y, defined on some closed subset 92, for example 
Q = {x = (z1, £2)|0 < z1 < 1, z2 = 0}. Let y € Lip(M,, ||-||). Then obviously 
the Lipschitz constant of f, M, verifies M > M}. From (6.6) we obtain 


By(x) = fon) + M||((a1 — t)+,r2)|I) 





= min (y(t) + MI ~ 1), 22)I) (6.27) 
Bi) = max (y) ~ MIIE — 21)+0)) = 1a) 


If the marginal is given on 2 = {x = (x1, £2)|0 < zı < 1,a2 = 1}, then 
the bounds are 


By (x) = min (y(t) + M| = t)+,0)|1) = (21), 





tE[0,1] 
Bi(x) = max (y(t) — MI||((t = 21)+,1 ~22)ID (6.28) 
= max, (0) — MIE- z1), 1 = 22)I)). 


To solve the optimization problem in each case we apply the Pijavski-Shubert 
method with the Lipschitz parameter M. 

For the general multivariate case the equations are as follows. Let yi, i = 
1,...,n be a function from Lip(M,, || - ||) representing the i-th marginal 


Vx € N3; : f(x) = qi(zi), Qi = {x € [0, 1]” |z; € [0,1], £; = 0, j F i}. 
The bounds resulting from the i-th marginal are 


Bul) = main (uA Mei sti (etea), 


Bi (x) = y:(xi), 


tions from m-variate marginals, as exemplified below. Let y : [0,1]™ — [0,1] 
denote a marginal of f: Vx € 2 : f(x) = 7(y), with 


Q = {x € [0,1]"|21,...,¢m € [0, 1], ¢%m4i =... = £n = 0} 
and y € [0,1], yi = «j,i = 1,...,m. Then the upper and lower bounds on 
f(x), x € [0,1]” \ 2 are 
B(x) = 


min y 
zE[0,x1]X... Xx [0, £m] 

+ M||((21 — 21)4,.--, (2m — m) f imk Ea) 
Bi(x) = (x1, ee sim): 
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Table 6.1. The lower and upper bounds Bı, Bu on Lipschitz aggregation functions 
with Lipschitz constant M and the listed ee 


Conjunctive function, a — A — x||, 0} min 
M>1 


Disjunctive function, max(x min{ M||x||, 1} 
M>1 


| Idempotent function [max{1— M||1 — x|], min(x wailh, max(x 


Neutral element 


M>1 


== denent max{1 — ||1 — x||, 0} x) 
e=1,M=1 
Neutral element max(x min{||x||, 1} 
e=0,M=1 

Diagonal section given by (6.24) 


5(t) = flt,t) 
p= m= ) M min{z1, £2} M max{x1, £2} 


Diagonal section + max (ô(t)— Mt) + min (d(t) — Mt) 
tela Al telel] 
6(t) = f(t,t) 


= min{x1, £2}, 6 = max{21, £2} 
po, va min(x) M max(x) 
Diagonal section H pa x (00 )— Mt) + mii (6(t) — Mt) 
tefo 
SE) = Fet) 
Opposite diagonal 


= min{zı, x2}, a oe x2} 
w(t) = f(t,1—t) 


= 


MS1(x)—- M+ max MTrz(x) + tnin 
tela AG tela, 














Opposite diagonal 
w(t) = f1- t) 
Marginal g = f(x)|xea, 
where (2 is the domain max{g(z) 


a = min{ x1, 1 — 2x2}, 8 = max{x1,1— z2} 


- M|\(z—x)+|l} |min{g(z2) + M| 


of the marginal 
0, if x € [0, e]” min(x), if x € [0, e]” 


1, if x € [e,1]” 


Mixed function, 


conjunctive in [0, e]”, 
disjunctive in [e, 1]” 


Mixed function, 


disjunctive in [0, e]”, 


conjunctive in |e, 1 
Mixed function, 
disjunctive in [0, e] 


idempotent in fe, 1]” 


Mixed functi 
conjunctive in [0, e 
idempotent in 








max(x), if x € [e, 1]” 


(G17), elsewhere 


max(x), if x € [0, e]” 


e, if x € [e, 1]” 
e, elsewhere 


max(x), if x € [0, e]” 
min(x), if x € [e, 1]” 


e, elsewhere 
0, if x € [0, e]” 


min(x), if x € [e, 1]” 


Mije- x) 
elsewhere 


(G19), elsewhere 
e, if x € [0, e]” 
min(x), if x € [e, 1]” 
e, elsewhere 
e, if x € [0, el” 
max(x), if x € [e, 1]” 
max(x), elsewhere 
min(x), if x € [0, e]” 
max(x), if x € [e, 1]” 
min{max(x),(619)}, 


elsewhere 
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Computation of the minimum in the expression for B, involves a noncon- 
vex m-dimensional constrained optimization problem. There is a possibility 
of multiple locally optimal solutions, and the use of local descent algorithms 
will not deliver correct values. The proper way of calculating Bu is to use 
deterministic global optimization methods. We recommend using the Cutting 
Angle Method, see Appendix [A.5.5] One should be aware that deterministic 
global optimization methods work reliably only in small dimension, m < 10. 
We do not expect m to be greater than 3 in applications. 


6.4.7 Noble reinforcement 


We recall from Section [B.7]a special class of disjunctive aggregation functions, 
which limit the mutual reinforcement of the inputs. We have considered several 
instances of the noble reinforcement requirement, namely, 


1. Provide reinforcement of only high inputs. 

2. Provide reinforcement if at least k inputs are high. 

3. Provide reinforcement of at least k high inputs, if at least m of these 
inputs are very high. 

4. Provide reinforcement of at least k high inputs, if we have at most m low 
inputs. 


Of course, it is possible to combine these requirements. We have seen 
that an ordinal sum of t-conorm construction provides a solution to the first 
requirement. In this section we provide aggregation functions that satisfy the 
other mentioned requirements for crisp subsets of high, very high and low 
inputs. Fuzzification of these sets is achieved as described in Section B-7] 

Thus we concentrate on the construction of aggregation functions that 
satisfy conditions set in the Definitions 3.137] We will focus on Lipschitz 
aggregation functions with a given Lipschitz constant M > 1. We have defined 
three crisp thresholds a, 8, y, y < a < 3, so that the intervals [a, 1], [6,1] and 
(0, y] denote respectively high, very high and low inputs. 

The above mentioned requirements translate into the following aggregation 
functions (see pp. 9193). 


1. Reinforcement of high inputs 


Ajee(x), if JEC qi, ...,n}bWieE:a,>a 
Fy(x) = and Vi € E : ti <a, (6.29) 
max(x) otherwise, 





2. Reinforcement of k high inputs 


Aicel(x), if SE C {1,...,n}| |E| > k, 
Vi € E: ti Q, 
Fo,k(X) = and Vi € Ê: ti <a, (6-80) 


max(x) otherwise, 
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3. Reinforcement of k high inputs with at least m very high inputs 





Aice(x), if 3E C {1,...,n}| JE] > k, 
Vie E:nj >aViEe€: x <a, 
Fa,b,k,m(X) = and ID C E| |D| = m, (6.31) 
Vi € D: qzi > B, 
max(x) otherwise, 





4. Reinforcement of k high inputs with no more than m low inputs 


Ajee(x), if SE C {1,...,n}| |E] > k, 
ViEE: aj >a,VicE: 2 <a, 
Foy,k,m (x) = and ID C {1,...,n}| (6.32) 
D| =n-m, Yi ED: tiy, 
max(x) otherwise, 








where A;ice(x) is a disjunctive aggregation function, applied only to the com- 
ponents of x with the indices in £. 


Reinforcement of high inputs 


Consider condition (6.29) with a fixed a. Denote by E a subset of indices 
{1,...,n} and by € its complement. For k = 0,...,n, denote by Ep the set 
of points in [0,1]” which have exactly k coordinates greater than a, i.e., 





Ex, = {x € [0,1]"|SE, such that |E| = k, 
VWiEE: a<x; and Vj € Ê: 2; < a}. 


The subsets Ep form a non-intersecting partition of [0, 1]”. Further, Eo U 
Ei U...U Ex is a compact set. 

Eq. (6.29) reads that Fa (x) = max(x) on Eo, and Fa (x) > max(x) on the 
rest of the domain, and further Fa (x) > Faly) for all x € Eg, y E€ Em, k > m. 
The latter is due to monotonicity with respect to argument cardinality. Also, 
since no reinforcement can happen on the subset £1, we have Fa(x) = max(x) 
on E1 U Eo. This expresses the essence of the noble reinforcement requirement. 

Let us now determine the upper and lower bounds Bu, Bı on Fa from (6.6). 
We use 22 = Fy U Eo U {(1,...,1)}, as this is the set on which the values of 
the aggregation function are specified. The datum F,(1,...,1) = 1 implies 
the upper bound F(x) < 1. The general lower bound is Fy (x) > max(x) due 
to its disjunctive character. Now we need to find the upper bound Bu which 
results from the condition Fẹ = max on Ey U Ep. 

Thus for any fixed x we need to compute 

By(x) = min {max(z) + MI|(x — z)+ll}- 

This is a multivariate nonlinear optimization problem, which can be re- 

duced to n univariate problems, which are easy to solve. Consider x € Ex, for 
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a fixed k, 1 < k < n, which means that k components of x are greater than 
a. Let j E€ E be some index such that xj > a. It was shown in bod that 
the minimum is achieved at z* whose j-th component z} € [a,j] is given by 
the solution to a univariate optimization problem (Eq. (6.33) below), and the 
rest of the components are fixed at z7 = a,i Æ j. That is, we only need to 
find the optimal value of the component z;, and then take minimum over all 
JEE. 

Denote by 77 = Vice i¢j (vi — a)”. We have k = |E| univariate problems 


=mi ; ; * . — ».)P)1/p 
By (x) me ea + M(95 + (2; — z;)”) P}. (6.33) 


For a fixed j the objective function is a convex function of zj and hence 
will have a unique minimum (possibly many minimizers). A modification of 
Proposition establishes this minimum explicitly 


Proposition 6.14. Let y > 0, M > 1, p > 1, a,ß € [0,1], a < 8 and 


folt) =t + M((B-t +7)”. 





The minimum of fa(t) on [a, p] is achieved at 


e t=a,ifM=1; 
e t} =, ifp=1andM >1; 


1 
e t= Meia,- (=) | otherwise, 
Mpr-i—1 


and its value is 
M(y+(8-a)?)?, ift" =a, 


min fa(t) = { 8+ My?, ift =p, (6.34) 
B+ (Met — 1) y? otherwise. 





Example 6.15. Consider the case of 1-Lipschitz aggregation functions M = 
1,p = 1. Define the subset € C {1,...,n} as in (6.29). The minimum in (6.33) 
is achieved at z; = a for every 7 € €. The upper bound B, is given as 


Bu (x) = min(1,a+ Xo (zi — a)). 
ilr:i>a 


The largest 1-Lipschitz aggregation function with noble reinforcement with 
threshold a is given as 


[ma Y (a —a)}, if Ji : x; >q, 





Fy(x) = ili>a (6.35) 


max(x) otherwise, 


which is the ordinal sum of Lukasiewicz t-conorm and max. 
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The optimal aggregation function is 


1 
f(x) = s(max(x), Fa(x)), 
which is no longer a t-conorm. 


We would like to mention that even though we did not use associativity, 
the aggregation function Fẹ is defined for any number of arguments in a 
consistent way, preserving the existence and the value of the neutral element 
e = 0. To pass from a crisp threshold a to a fuzzy set high, we apply an earlier 
equation (3.29), with function A replaced by Fy, and a taking values in the 
discrete set {21,...,@n}. 


Reinforcement of k high inputs 


Now consider function (6.30) which involves cardinality of €. It reads that 
Fy.~(X) is maximum whenever less than k components of x are greater or 
equal than a. Therefore we use the interpolation condition Fa, (x) = max(x) 
on 2 = Eo U E1 U... U Ex_y. As earlier, B, is given by 


Bul) = op dit, yp, mxl) + MI- zll} (6.36) 


Let us compute this bound explicitly. We have an n-variate minimization 
problem which we intend to simplify. As earlier, x is fixed and € denotes 
the subset of components of x greater than a, E denotes its complement and 
|E| > k. The minimum with respect to those components of z whose indices 
are in Ê is achieved at any z* € [2;, a],i € E. So we fix these components, say, 
at z7 = a and concentrate on the remaining part of z. 

At most k — 1 of the remaining components of z are allowed to be greater 
than a when z ranges over (2, we denote them by zx,,---, 2,1, C E,|K| = 
k —1. The minimum with respect to the remaining components is achieved at 
zs = a,i ¢ K. Now take all possible subsets K C € and reduce the n-variate 
minimization problem to a number of k — 1-variate problems with respect to 


ZK: SK py 


f ee ; . 1 
Bub) = emmy amin tmaelas) + MOR + D(a — a) 





where yý = Yieee: —a)?. 

It was shown in that the minimum for a fixed K is achieved when all 
the variables z;,i € K are equal, and hence we obtain univariate minimization 
problems with t = zx, 


u(x) = j in {t+ M(yé 4 poe y, i 
Bulx) = min, min dtt Mok 2 j4) P} (6.37) 
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The minimum over all subsets K in (6.37) has to be computed exhaustively. 
The problem with respect to t is convex and piecewise smooth. It can be easily 
solved by the golden section method, and in some cases explicitly. 

For the special case p = 1, M > 1 we have t = xx, and 


oy econ T >" a)) (6.38) 


For p = 2 we obtain a quadratic equation in t which we also can solve explicitly. 


Example 6.16. Consider again the case of 1-Lipschitz aggregation functions 
M = 1,p = 1, for n = 4 and the requirement that at least k = 3 high 
arguments reinforce each other. We reorder the inputs as za) È T2) = z3) 2 
x4). Applying we have the strongest 1-Lipschitz aggregation function 


min{1,2(1) + 2/3) — a}, if z(4) < a and 
T(1), (a) Bal > Q, 
min{1, z(a) + 23) + z(a) — 2a}, if all za), .-., (4) > a, 
max(x) otherwise. 


Fa(x) = (6.39) 





Note the absence of x(2) (the subsets E \ K in (6.38) have cardinality 2 when 
£4) > a and 1 when z4) < a). The optimal aggregation function is 


1) = 5(max(x), Fa(x)). 


The algorithmic implementation of (6.30) or (6.37), see Figure [6.7] is 
straightforward (the former is a special case of the latter). 


Note 6.17. When the noble reinforcement property does not involve the minimum 
cardinality k of the set of high arguments (property (6.29)), use k = 2. 


Reinforcement of k high inputs with at least m very high inputs 


We consider calculation of the bound B, in (6.31). We proceed as earlier, 
but note that the definition of the subset where the value of Fo,g,4,m is 
restricted to max has changed. Fortunately, we can still use the algorithms 
from the previous section as follows. We can write Q = (2; U 22, where 
Qı =FoUF,U...,UER-1 as in (6.37), and 


Rə = {x € [0, 1]”| ID such that 
|D| = m, Vi € {1,... n} \ D : a; < b}, 





i.e., 22 is the set of points that have less than m coordinates greater than (3. 
According to (6.31), Fa,8,k,m is restricted to max on that subset. 

Next we show that the bound B,, can be easily computed by adapting our 
previous results. First, note that 
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Algorithm 1 

Purpose: Find the upper bound B,,(x) given by (6.36). 

Input: Vector x, threshold a, subset €, cardinality k, Lipschitz constant M, 
norm ||- lp- 

Output: The value Bu (x). 


Step 1 For every subset K € €,|K| = k — 1 do 
Step 1.1 Compute yý = Yo (ti-a). 
iE E\K 
Step 1.2 Compute the largest component tc, = MaxXick Ti. 


Step 1.3 Find the minimum ox = min {t+M(ye+ X (ai — t2)? 
tela,rK,] i€K 


by using golden section method. 
Step 2 Compute Bu = min OK. 
E 


Step 3 Return Bu. 


Algorithm 2 

Purpose: Compute the value of an aggregation function with noble reinforce- 
ment of at least k components (6.30). 

Input: Vector x, threshold a, the minimum cardinality k of the subset of 
reinforcing arguments, Lipschitz constant M, norm ||- ||p. 

Output: The value Fa, (xX). 


Step 1 Compute the subset of indices € = {i|x; > a}. 

Step 2 Call Algorithm 1(x, a, E, k, M, p) and save the output in By. 
Step 3 Compute Fa,k = min{1,Bu}rmax(%) 

Step 4 Return Fa,k- 





Fig. 6.7. Algorithmic implementation of Eqs. (30) and (6.37) 


Bu(x) = min{max(2) + M||(x — 2)+llp} 
_ min { min {max(z) + M||(x — 2)+llp}, min {max(z) + M||(x - z)+lle}} 


We already know how to compute the minimum over 9; by using (6.37). 
Consider a partition of [0, 1]” into subsets 
D; = {x € [0,1]"| SD such that |D| = j, 
Vie D, B < a; < 1,Yj € D, zj < p}, 





for j =0,...,n. It is analogous to the partition given by Ek on pp9I] with 8 
replacing a. D; is the set of input vectors with j very high scores. 

Now Rə = Do U...UDm_—1. Thus computation of the minimum over 23 is 
completely analogous to the minimum over 2; (cf. (6.36)), the only difference 
is that we take m < k instead of k and ( rather than a. Hence we apply the 
solution given in (6.37) for m > 1. 

The special case m = 1, i.e., the requirement “at least one score should be 
very high” is treated differently. In this case 22 = Do and solution (6.37) is not 
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applicable. But in this case an optimal solution is b4, B= (0, 8s 2452): 
The value of the minimum in this case is 


min {max(z) + M||(x — 2)+||p} = G+ M( D (z; — BP). 
° ilai>B 


Reinforcement of k high inputs with at most m low inputs 


Here we have 1 < k < n and 0 < m < n—k; when m = 0 we prohibit 
reinforcement when at least one low input is present. We consider calculation 
of the bound B, in (6.32). 

We proceed similarly to the previous case. Form a partition of [0,1]": for 
j =0,...n define 


D; = {x € [0,1]"| SD C {1,...,n} such that |D| = j, 
WED, y<a< 1,Yj € Ď, zj <y}. 





D; is the set of points with n — j small coordinates, and the aggregation 
function Fa,y,k,m Should be restricted to maximum on (23 = DoU. . .UDn-m-1, 
as well as on 92; = Eo U...U Ekg—1, hence 


Bu(x) = min{max(z) + MI|(x — 2) lv} 
” min { min {max(z) + M||(x — 2)+llp}, min {max(z) + M| - z)+lle}} 


where the minimum over 2; is computed by using (6.37), and the minimum 
over (23 is computed analogously (by replacing a with y and k with n — m). 


All the requirements discussed in this section led to defining the subset 2 
on which the aggregation function coincides with the maximum. In all cases 
we used the basic equations (6.4), and reduced the n-variate minimization 
problems to a number of univariate problems, of the same type as (6.37), 
with different parameters. Numerical computation of the bounds B,, is done 
efficiently by applying algorithms on Fig. [6.7] with different parameters. 


T 


Other Types of Aggregation 
and Additional Properties 


In this concluding chapter we will give a brief overview of several types of ag- 
gregation functions which have not been mentioned so far, as well as pointers 
to selected literature. There are many specific types of aggregation functions 
developed in the recent years, and it is outside the scope of this book to 
have their exhaustive overview. We also briefly summarize some mathemat- 
ical properties of aggregation functions not mentioned in Chapter [1] Some 
of the recent developments are covered in two new monographs on the topic 


14 37. 


7.1 Other types of aggregation 


Aggregation functions with flying parameter 


We have studied several families of aggregation functions which depend on 
some parameter. For instance power means (Definition [2.12), families of t- 
norms and t-conorms (Section B.4.11), families of copulas (Section B.5), fam- 
ilies of uninorms (Section 4.2.3), T-S functions (Section [4.5) and some others. 
In all cases the parameter which characterizes a specific member of each family 
did not depend on the input. An obvious question arises: what if the param- 
eter is also a function of the inputs? This leads to aggregation functions with 
flying parameter aa, p.43, Pza. 


Proposition 7.1. Let f,, r € [0,1] be a family of aggregation functions, such 
that fr, < fra as long as rı < r2, and let g be another aggregation function. 
Then the function 


foltis Hesg Tr) = Fates Ces see a) 


is also an aggregation function. Further, if fr is a family of conjunctive, dis- 
junctive, averaging or mixed aggregation functions, fy also belongs to the re- 
spective class. 
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The same result is valid for extended aggregation functions. Note that 
many families of aggregation functions are defined with respect to a parameter 
ranging over [0,00] or [—00o, co]. To apply flying parameter construction, it is 
possible to redefine parameterizations using some strictly increasing bijection 
y : [0,1] — [0, co] (or — [—00, 00]), for instance y(t) = z4. In this case we 
have fr = fip(g(e1,...,0n))* 


Example 7.2. Let fp be the Hamacher family of t-norms Tj’, A € [0,00] (p. 
[152), and let g be the arithmetic mean M. The function 


THC) = Tracey (2) 


is a conjunctive aggregation function (but it is not a t-norm). 


Example 7.8. Let f, be the family of linear convex T-S function Ly r,s (p. 
231) and let g = Tp, the product t-norm. The function 


Lrp,7,s(%) = Lrp(x),7,5(*) = (1 — Tp(x)) - T(x) + Tp(x) - S(x) 


is an aggregation function. Interestingly, in the case of T = Tp,S = Sp, 
(product and dual product) and g = U, the 3 — I uninorm (p. 209), we 
obtain [43], p.43, 


Lu,Tp,sp = U. 


Weighting vectors of various aggregation functions, such as weighted quasi- 
arithmetic means, OWA, weighted t-norms and t-conorms, weighted uni- 
norms, etc., can also be made dependent on the input vector x. Examples of 
such functions are the Bajraktarevic mean (Definition 2.48]on p.[66), counter- 
harmonic means (see p.[63), mixture functions (see p.[67) and neat OWA (Def- 
inition 2.55)on p. [73). One difficulty when making the weights dependent on 
x is that the resulting function f is not necessarily monotone non-decreasing, 
and hence is not an aggregation function. For mixture functions a sufficient 
condition for monotonicity of f was established recently in , see discussion 


on p. [67] 


Averaging functions based on penalty functions 
Given a vector x, it is possible to formulate the following problem: what would 


be the value y which is in some sense closest to all values x;, i = 1,...,n. The 
measure of closeness is evaluated by using 


P(x,y) =) wivlai,y), (7.1) 


where p : [0,1]? — [0,00] is some “penalty”, or dissimilarity, function, with 
the properties i) p(t,s) = 0 if and only if t = s, and ii) p(ti,s) > p(te,s) 
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whenever tı > tg > s or tı < t2 < s, and w is a vector of non-negative 
weights [ad [183). The value of the aggregation function f is the value y* which 
minimizes the total penalty P(x,y) with respect to y for a given x, f(x) = 
y* = argmin,P(x,y). f is necessarily an averaging aggregation function. 

To ensure y* is unique, the authors of use so called “faithful” penalty 
function, p(t,s) = AK (h(t), h(s)), where h : [0,1] — [—o0, co] is some contin- 
uous monotone function and K : [—oo, co] — [0,00] is convex. Under these 
conditions P has a unique minimum, but possibly many minimizers. y* is then 
the midpoint of the set of minimizers of (71). 

In the special case p(t,s) = (t — s)?, f becomes a weighted arithmetic 
mean, and in the case p(t,s) = |t — s| it becomes a weighted median. If 
p(t, s) = (h(t)—h(s))? one obtains a weighted quasi-arithmetic mean with the 
generator h. Many other aggregation functions, including OWA and ordered 
weighted medians, can also be obtained, see [183]. 

Note that closed form solutions to the penalty minimization problem, such 
as those mentioned above, are rare. In general one has to solve the optimization 
problem numerically. On the other hand, this method offers a great flexibility 
in dealing with means. 

A related class of averaging functions is called deviation means {40}, p. 
316. The role of penalty functions is played by the deviation functions d : 
[0,1]? — R which are continuous and strictly increasing with respect to the 
second argument and satisfy d(t, t) = 0. The equation 


3 wid(xi, y) = 0 
i=l 


has a unique solution, which is the value of f(x). If d(t,s) = h(s) — h(t) 
for some continuous strictly monotone function h, one recovers the class of 
weighted quasi-arithmetic means with the generator h. 


Bi-capacities 


Recently the concept of a fuzzy measure was extended to set functions on a 
product 2% x 2", N = {1,...,n}, called bi-capacities . Formally, 
let 

QIN) = {(A,B) € 2% x N |AN B = b}. 
A discrete bi-capacity is the mapping v : Q(N) — [-1,1], non-decreasing 
with respect to set inclusion in the first argument and non-increasing in the 
second, and satisfying: 


v(0, 0) = 0, u(N, 0) = 1,v(0,N) =-1. 


Bi-capacities are useful for aggregation on bipolar scales (on the interval 
[—1,1]), and are used to define a generalization of the Choquet integral as 
an aggregation function. Bi-capacities are represented by 3” coefficients. The 
Mobius transformation, interaction indices and other quantities have been 


defined for bi-capacities as well (112) (144) 258 , see also (214) Bis). 
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Aggregation functions on sets other than intervals 


In this book we were interested exclusively in aggregation functions on (0, 1] 
(or any closed interval fa, b], see p. BIJ. However there are many constructions 
in which the aggregation functions are defined on other sets. We mention just 
two examples. 

Aggregation functions can be defined on discrete sets, such as S = 
{0, i, 4, 3, 1}, i.e., as functions f : S” — S. An example of such aggregation 
functions are discrete t-norms and copulas IEE! . Another example 
is the class of weighted ordinal means [148]. 

Aggregation functions can also be defined on sets of intervals, i.e., when 
the arguments and the value of an aggregation function f are not numbers 
from [0,1] but intervals. Formally, the lattice £L? = (LŽ, < z1) where 


L? = {[a,b]|(a,b) € [0,1]? and a < b}, 


[a,b] <z: [c,d] & (a < b and c < d) for all [a,b], [c,d] € L’. 


Such aggregation functions are useful when aggregating membership values of 
interval-valued and intuitionistic fuzzy sets ij 
Some recent results related to interval-based t-norms can be found in 


Linguistic aggregation functions 


In the authors proposed the linguistic OWA function, based on the 
definition of a convex combination of linguistic variables 69). They subse- 
quently extended this approach to linguistic OWG functions 125), see also 

Linguistic variables are variables, whose values are labels of fuzzy sets 
bs Rsa. The arguments of an aggregation function, such as linguistic 
OWA or linguistic OWG, are the labels from a totally ordered universal set 
S, such as S ={ Very low, Low, Medium, High, Very high}. For example, 
such arguments could be the assessments of various alternatives by several 
experts, which need to be aggregated, e.g., f(L, M, H, H, VL). The result of 
such aggregation is a label from S. 


Multistage aggregation 


Double aggregation functions were introduced in [5o with the purpose to 
model multistage aggregation process. They are defined as 


1 The standard OWA and OWG functions were defined on pp. [6873] 
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where a, g, h are aggregation functions, and y € [0, 1]*,z € [0,1]", k+m=n 
and x = y|z. “-|-” denotes concatenation of two vectors. A typical application 
of such operators is when the information contained in y and z is of differ- 
ent nature, and is aggregated in different ways. a may have more than two 
arguments. 

Double aggregation functions can be used to model the following logical 
constructions If (A AND B AND C) OR (D AND E) then... 

In this process some input values are combined using one aggregation 
function, other arguments are combined using a different function, and at 
the second stage the outcomes are combined with a third function. While 
the resulting function f is an ordinary aggregation function, it has certain 
structure due to the properties of functions a,g and h, such as right- and 
left-symmetry, and so forth (50). 

Multi-step Choquet integrals have been treated in [186, [1941 [237]. 


7.2 Some additional properties 


As we already mentioned, there are several criteria that may help in the se- 
lection of the most suitable aggregation function, and one of these criteria is 
the fulfillment of some specific mathematical properties. The most important 
properties, such as idempotency, symmetry, associativity, existence of a neu- 
tral or an absorbing element, etc., have been studied in detail in Chapter[I] In 
this section we briefly summarize some additional properties of aggregation 
functions. 


Invariantness and Comparison Meaningfulness properties 


The properties of homogeneity, shift-invariance, linearity and self-duality, pre- 
sented in Chapter [I] are special cases of a more general property, invariant- 
ness, which characterizes an important class of scale-independent functions: 


Definition 7.4 (Invariant aggregation function). Given a monotone bi- 
jection ọ : [0,1] — [0,1], an aggregation function f is called y-invariant if it 
verifies f = fọ, where fy is defined as 


Feltr- -3 En) = 9 "(F(G(a1);---, 9(@n)))- 


An aggregation function is said to be invariant when it is p-invariant for every 
monotone bijection yp. 


When y is a strong negation, y-invariant ead a ation functions are nothing 
but N-self-dual aggregation functions (see, e.g. 160, [161}), i.e., N-symmetric 
sums, which have been studied in detail in te On the ether hand, the 
class of functions invariant under increasing bijections has been completely 
characterized in (continuous case) and in (general case). Note that 
the only invariant aggregation functions are projections. 
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A related property is called comparison meaningfulness. 


Definition 7.5 (Comparison meaningful aggregation function). Given 
a strictly increasing bijection ọ : [0,1] — [0,1], an aggregation function f is 
called y-comparison meaningful if for any x,y € [0,1]” 


< 


fo) {Sh sy) implies sto) {Sb sow, 


where for any z € [0,1]", p(z) denotes the vector (~(21),-.-, P(Zn)). An aggre- 
gation function is said to be comparison meaningful when it is y-comparison 
meaningful for every strictly increasing bijection yp. 


Comparison meaningful functions are presented in (168). 


The Non-Contradiction and Excluded-Middle Laws 


There are two well-known logical properties, the Non-Contradiction (NC) 
and the Excluded-Middle (EM) laws, which were studied in the context of 
aggregation functions from two different points of view. Focussing on Non- 
Contradiction, this law can be stated in its ancient Aristotelian formulation 
as follows: for any statement p, the statements p and not p cannot hold at the 
same time, i.e., p^ ap is impossible, where the binary operation ^ represents 
the and connective and the unary operation — stands for negation. This for- 
mulation can be interpreted in at least two different ways, depending on how 
the term impossible is understood [240]: 


1. Taking an approach common in modern logic, the term impossible can 
be thought of as false, and then the NC principle can be expressed in a 
logical structure with the minimum element 0 as p A =p = 0. 

2. Another possibility, which is closer to ancient logic, is to interpret impossi- 
ble as self-contradictory (understanding that an object is self-contradictory 
whenever it entails its negation). In this case, the NC principle can be 
written as p A =p E ~(p A ~p), where E represents an entailment relation. 








In the context of aggregation functions, if the operation A^ is represented 
by means of a bivariate aggregation function f : [0,1]? — [0,1], and the logical 
negation is modeled by a strong negation! N, the NC law can be interpreted 
in the following ways: 


Definition 7.6. Let f : [0,1]? — [0,1] be a bivariate aggregation function and 
let N : [0,1] — [0,1] be a strong negation. 


e f satisfies the NC law in modern logic w.r.t. to N if f(t, N(t)) = 0 holds 
for all t € [0,1]. 


2 See Definition in Chapter [] 
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e f satisfies the NC law in ancient logic w.r.t. to N if f(t, N(t)) < 
N(f(t, N(t))) holds for all t € [0,1]. 


Similar arguments can be applied to the Excluded-Middle law, and they 
result in the following definition, dual to the Definition 


Definition 7.7. Let f : [0,1]? — [0,1] be a bivariate aggregation function and 
let N : [0,1] — [0,1] be a strong negation. 


e f satisfies the EM law in modern logic w.r.t. to N if f(t, N(t)) =1 holds 
for all t € [0,1]. 

e f satisfies the EM law in ancient logic w.r.t. to N if N(f(t,N(t))) < 
f(t, N(t)) holds for all t € [0,1]. 


The satisfaction of the NC and EM laws by various aggregation functions 
was studied in (ancient logic interpretation) and in (modern logic 
interpretation). 


Local internality property 


An aggregation function is said to be locally internal when it always provides 
as the output the value of one of its arguments: 


Definition 7.8. An aggregation function f is called locally internal if for all 
Tise n € [0,1], f(ri,..-,¢n) € {iiei Ln}. 


Evidently, any locally internal aggregation function is idempotent (and 
then, see Note[L.12] it is an averaging function), but not vice-versa. Projections 
and order statistics (see Chapter), along with the minimum and maximum, 
are trivial instances of locally internal aggregation functions. Other functions 
in this class are the left- and right-continuous idempotent uninorms charac- 
terized in (64). For details on this topic, we refer the reader to {170}, where 
bivariate locally internal aggregation functions have been characterized and 
studied in detail in conjunction with additional properties, such as symmetry, 
associativity and existence of a neutral element. 


Self-identity property 


In Yager introduced the so-called self-identity property, applicable to 
extended aggregation functions, and defined as follows. 


Definition 7.9. An extended aggregation function F has the self-identity 
property if for all n > 1 and for all z1,...,&n € [0,1]”, 


F(a1,...,0n,F(21,...,%n)) = F (fips Era): 
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Note that extended aggregation functions that satisfy the self-identity 
property are necessarily idempotent (and hence averaging), but the converse 
is not true. The arithmetic mean, the a-medians (see Chapter) and the func- 
tions min and max are examples of extended idempotent functions with the 
self-identity property. A subclass of weighted means that possess this property 
was characterized in ‘ 

Extended aggregation functions with the self-identity property verify, in 
addition, the following two inequalities: 


F(a1,...,0n,k)> F(a1,...,¢n) if k> F(a,...,2n), 
F(a1,...,2n,k) S F(ai,...,@n) if k< F(am,...,2n). 


A 


Tools for Approximation and Optimization 


In this appendix we outline some of the methods of numerical analysis, which 
are used as tools for construction of aggregation functions. Most of the ma- 
terial in this section can be found in standard numerical analysis textbooks 
BJ. A few recently developed methods will also be presented, and we will 
provide the references to the articles which discuss these methods in detail. 


A.1 Univariate interpolation 


Consider a set of data (zk, yk) k = 1,...,K, £k, yk € R. The aim of in- 
terpolation is to define a function f, which can be used to calculate the 
values at x distinct from z. The interpolation conditions are specified as 
f(zk) = yk for all k = 1,..., K. We assume that the abscissae are ordered 
Tk < Tk+1,k =1,..., K — 1. 

Polynomial interpolation is the best known method. It consists in defining 
f as (K — 1)-st degree polynomial 


f(a) = agi! + an_e* 7 +... +a + ao, 


and then using K interpolation conditions to determine the unknown coeffi- 


cients ax—1,...,@9. They are found by solving a linear system of equations 
K-1 „K-2 
Tı Tı E a i 1 aK-1 yı 
K-1 „K-—2 
T2 £2 zs. æa 1 aK—2 Y2 
K-1 „K-2 
zk Tg -Tgl ao YK 


It is well known that such a system of equations is ill-conditioned, and that 
writing down a polynomial f in Lagrange or Newton basis, 


f(z) = bk-1NK-1() + bx-2NK-2(r) +...+b1Ni (x) + bo Nola), 
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produces a different representation of the same polynomial, but with a better 
conditioned system of equations, a lower triangular for the Newton’s basis and 
the identity matrix in the Lagrange basis. 

It is not difficult to generalize this method for interpolation in any finite- 
dimensional function space. Let B1, Bo,..., BK be some linearly independent 
functions. They span a linear vector space V, such that any f € V can be 
written as 


K 
f(a) = pa ap B(x). 
k=1 


The interpolation conditions yield the linear system of equations Ba = y with 


the matrix 
By(a1) Bo(x1) eee Br(a1) 
By (a2) Bo(x2) Beas Br(x2) 


Biad ites. Bee 


The usual choices for the functions B, are the Newton polynomials, 
trigonometric functions, radial basis functions and B-splines. B-splines are es- 
pecially popular, because they form a basis in the space of polynomial splines. 

Polynomial splines are just piecewise polynomials (i-e., on any interval 
[£k, x41] f isa polynomial of degree at most n), although typically continuity 
of the function f itself and some of its derivatives is required. For a spline 
of an odd degree n = 2m — 1, the first m — 1 derivatives are continuous and 
the m-th derivative is square integrable. If less than m derivatives are square 
integrable, the spline is said to have deficiency greater than 1. Typically linear 
(n = 1) and cubic (n = 3) splines are used fod (77). 

The most important property of polynomial interpolating splines (of an 
odd degree 2m — 1 and deficiency 1) is that they minimize the following 
functional, interpreted as the energy, or smoothness 


FA = ff (Pat 


1 


Thus they are considered as the most “smooth” functions that interpolate 
given data. Compared to polynomial interpolation, they do not exhibit un- 
wanted oscillations of f for a large number of data. 

B-splines are defined recursively as 


1, if x € [te, tei) 
ii A ’ tkt 
= { 0, otherwise. 


1 And with the conditions f?”~”) (x) = 0 at the ends of the interval [x1, xx]; such 
splines are called natural splines. Alternative conditions at the ends of the interval 
are also possible. 
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pyje 2 = pl (gy ¢§ SH? ot (a), 11,2... 
ttt — tk tk+l+1 — tk+1 
where tk, k = 1,...,r is the set of spline knots which may or may not coincide 
with the data zk, and the upper index | + 1 denotes the degree of B-spline. 
B-splines form a basis in the space of splines, and the interpolating spline can 
be expressed as B] 


fa) = > Bye). 
k=1 


An important property of B-splines is the local support, B(x) > 0 only 
when a €E [tx,tg+141). AS a consequence, the matrix of the system of equa- 
tions B has a banded structure, with just n co-diagonals, and special solution 
methods are applied. 


A.2 Univariate approximation and smoothing 


When the data contain inaccuracies, it is pointless to interpolate these data. 
Methods of approximation and smoothing are applied, which produce func- 
tions f that are regularized in some sense, and fit the data in the least squares, 
least absolute deviation or some other sense. 

Let us consider again some basis {B1,..., Bn}, so that approximation is 
sought in the form 


f(a) = » a; B;(x). (A.1) 


Now n is not the same as the number of data K. Regularization by restrict- 
ing n to a small number n < K is very typical. When B; are polynomials, 
the method is called polynomial regression. The basis functions are usually 
chosen as a system of orthogonal polynomials (e.g., Legendre or Chebyshev 
polynomials) to ensure a well conditioned system of equations. 

The functions B; can be chosen as B-splines, with a small number of knots 
tk fixed in the interval [21,2]. These splines are called regression splines. 
Regardless what are the basis functions, the coefficients are found by solving 
the over-determined linear system Ba = y, with 


2 There are some technical conditions related to the behavior of the spline at the 
ends of the interval, for instance natural splines require the (2m — 2)-th deriva- 
tive at the ends of the interpolation interval be zero. Thus there are more basis 
functions r than the data, in fact r = K +2n. Also some elements of the B-spline 
basis require an extended partition, with the first and the last n knots tẹ taken 
outside [v1, £x]. These technicalities are usually built into the spline algorithms, 


see zA 
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Pie hie. ew 


Note that the matrix is rectangular, as n < K, and its rank is usually n. 
Since not all the equations can be fitted simultaneously, we shall talk about 
a system of approximate equalities Ba + y. 

In the case of least squares approximation, one minimizes the Euclidean 
norm of the residuals ||Ba — y||2, or explicitly, 


E n 1/2 
. 2 
min, p a; Bi (xx) — yr) (A.2) 
In that case there are two equivalent methods of solution. The first method 
consists in multiplying B by its transpose and getting the system of normal 
equations 
Na = B‘Ba = B‘y. 

The entries of an n x n matrix N are given as Nj; = yy B;(x~)B; (rp). 
The system of normal equations can also be obtained by writing down the 
gradient of and equalling it to zero. 

The alternative is to solve the system Ba = y directly using QR- 
factorization. The pseudo-solution will be precisely the vector a minimizing 
the Euclidean norm ||Ba — y||. 

An alternative to least squares approximation is the least absolute devia- 
tion (LAD) approximation ia. Here one minimizes 


K n 
min 2 | >, a; Bi(xx) — yrl, (A.3) 
possibly subject to some additional constraints discussed later. It is known 
that the LAD criterion is less sensitive to outliers in the data. 

To solve minimization problem one uses the following trick to convert 
it to a linear programming (LP) problem. Let r = f(£k)— yx be the k— th 
residual. We represent it as a difference of a positive and negative parts rk = 
rt =r}, r},r; > 0. The absolute value is |r| =r + ry. Now the problem 
is converted into an LP problem with respect to a,r*, r7 


K 
minimize YS (rf +r), (A.4) 
k=1 


s.t. X a; Bi (ae) (r =r) =Y k=1,...,K 


+ pen 
fify =O. 
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The solution is performed by using the simplex method, or by a specialized 
version of the simplex method iad. It is important to note that the system of 
linear constraints typically has a sparse structure, therefore for large K the 
use of special programming libraries that employ sparse matrix representation 
is needed. 

Regression splines employ a smaller number of the basis functions than 
the number of data. An alternative method of spline approximation is using 
smoothing splines. Natural smoothing splines are solutions to the following 
problem 


TK 


Minimize F(f) =p / 


K 
FOP + > unl F(x) — ue)”, 

z k=1 
where the smoothing parameter p controls the balance between the require- 
ments of smoothness and fitting the data (with p = 0 the solution becomes 
an interpolating spline, and with p — oo it becomes a single polynomial of 
degree m — 1). The values of the weights up express the relative accuracy of 
the data yx. Smoothing splines are expressed in an appropriate basis (usu- 
ally B-splines) and the coefficients a are found by solving a K x K system of 
“a with a banded matrix (2m + 1 co-diagonals). Details are given in 
159]. 


A.3 Approximation with constraints 


Consider a linear least squares or least absolute deviation problem, in which 
together with the data, additional information is available. For example, there 
are known bounds on the function f, L(x) < f(a) < U(x), or f is known to be 
monotone increasing, convex, either on the whole of its domain, or on given 
intervals. This information has to be taken into account when calculating 
the coefficients a, otherwise the resulting function may fail to satisfy these 
requirements (even if the data are consistent with them). This is the problem 
of constrained approximation. 

If the constraints are non-linear, then the problem becomes a nonlinear 
(and sometimes global) optimization problem. This is not desirable, as nu- 
merical solutions to such problems could be very expensive. If the problem 
could be formulated in such a way that the constraints are linear, then it can 
be solved by standard quadratic and linear programming methods. 


Constraints on coefficients 


One typical example is linear least squares or least absolute deviation approx- 
imation, with the constraints on coefficients a; > 0, eh ai = 1. There are 
multiple instances of this problem in our book, when determining the weights 
of various inputs (in the multivariate setting). We have a system of linear 
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constraints, and in the case of the least squares approximation, we have the 
problem 


K n 
minimize X` (>> a:Bi(£k) — yx)? (A.5) 
k=1 i=l 
s.t. a, = 1,a; > 0. 
i=1 


It is easy to see that this is a quadratic programming problem, see Sec- 
tion It is advisable to use standard QP algorithms, as they have proven 
convergence and are very efficient. 

There is an alternative to define the new unrestricted variables using 


log Qi+1 . 
b; = —— 7,1 = 
log a; 


| 
a 


on-l. (A.6) 


The unconstrained nonlinear optimization problem is solved in the variables 
bi, and then the original variables are retrieved using the inverse transforma- 
tion 


1 
đ@i = 5 A.T 
> (A.7) 
ag = aje’! 
a3 = aje’: tbe 


n-1 
: bi 
i ae 


n—1 : 
with Z=1+4+ Sy eX s=155, 
i=l 
Unfortunately, the quadratic structure of the least squares problem is lost, 
the new problem in variables b is a multiextremal global optimization problem 
(see Sections[A.5.4][A.5.5), which is hard to solve. Nonlinear local optimization 
methods can be applied, but they do not lead to the globally optimal solution. 
In the case of the least absolute deviation, it is quite easy to modify lin- 
ear programming problem to incorporate the new constraints on the 
variables a;i. 
The mentioned constrained linear least squares or least absolute deviation 
problems are often stated as follows: 


Solve Ax % b (A.8) 
s.t.Cx=d 
Ex <f, 
where A, C, E are matrices of size kx n, m xn and p x n, and b, d, f are vec- 


tors of length k,m,p respectively. The solution to the system of approximate 
inequalities is performed in the least squares or least absolute deviation sense 
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. Here vector x plays the role of the unknown coefficients a. There are 
specially adapted versions of quadratic and linear programming algorithms, 


see Section 
Monotonicity constraints 


Another example of constrained approximation problem is monotone (or iso- 
tone) approximation. Here the approximated function f is known to be mono- 
tone increasing (decreasing), perhaps on some interval, and this has to be in- 
corporated into the approximation process. There are many methods devoted 
to univariate monotone approximation, most of which are based on spline 
functions led, (7d (219). 

For regression splines, when using B-spline representation, the problem is 
exactly of the type or (A8), as monotonicity can be expressed as a set 
of linear inequalities on spline coefficients. It is possible to choose a different 
basis in the space of splines (called T-splines FEJ they are simply linear 
combinations of B-splines), in which the monotonicity constraint is expressed 
even simpler, as non-negativity (or non-positivity) of spline coefficients. It 
is solved by a simpler version of problem (A.8), called Non-Negative Least 
Squares (NNLS) (15). 

Alternative methods for interpolating and smoothing splines are based on 
insertion of extra knots (besides the data x;) and solving a convex nonlinear 
optimization problem. We mention the algorithms by Schumaker and 
McAllister and Roulier for quadratic interpolating splines, and by An- 
derson and Elfving (aj for cubic smoothing splines. Figures AI] A2] illustrate 
various monotone splines. 


os. 
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02. 


02. P 
+ 
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Fig. A.1. Plots of linear regression splines. The spline on the left is not monotone, 
even though the data are. The spline on the right has monotonicity constraints 
imposed. 











312 A Tools for Approximation and Optimization 








Fig. A.2. Plots of quadratic monotone regression splines. 


Convexity constraints 


Convexity /concavity of the approximation is another frequently desired prop- 
erty. Convexity can be imposed together with monotonicity. For regression 
splines, convexity can also be expressed as a set of linear conditions on spline 
coefficients ft. This makes convex spline approximation problem for- 
mulated as problem (A.8), which is very convenient. Works on constrained 


interpolating and smoothing splines include ba b4, [61l lot} L34 [92 Rig. 


A.4 Multivariate approximation 


Linear regression 


Linear regression is probably the best known method of multivariate approx- 
imation. It consists in building a hyperplane which fits the data best in the 
least squares sense. Let the equation of the hyperplane be 





f(x) = ao + a121 + a222 + - . . + Ann. 


Then the vector of coefficients a can be determined by solving the least squares 
problem 


K n 
minimize ) (ao + J itik — Yk), 
k=1 i=1 


where x; is the i-th component of the vector x;. Linear regression problem 
can be immediately generalized if we choose 


f(x) =agt+ Lo a; Bi (xi), 
i=1 
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where B; are some given functions of the i-th component of x. In fact, one can 
define more than one function B; for the i-th component (we will treat this case 
below). Then the vector of unknown coefficients can be determined by solving 
Ba ~ y in the least squares sense, in the same way as for the univariate 
functions described on p. [308] The solution essentially can be obtained by 
using QR-factorization of B. 

It is also possible to add some linear constraints on the coefficients a, for 
example, non-negativity. Then one obtains a constrained least squares prob- 
lem, which is an instance of QP. By choosing to minimize the least absolute 
deviation instead of the least squares criterion, one obtains an LAD prob- 
lem, which is converted to LP. Both cases can be stated as problem 
on p. BIOI 


Tensor product schemata 


Evidently, linear regression, even with different sets of basis functions B;, is 
limited to relatively simple dependence of y on the arguments x. A more 
general method is to represent a multivariate function f as a tensor product 
of univariate functions 


Ji 
fi(ai) = 5 aij Bij (xi). 
j=l 


Thus each univariate function f; is written in the form (A.1). 
Now, take a product f(x) = fi(x1)fo(v2)... fn(an). It can be written as 


f(x) = bin Bin (X), 


iM 


where Dm = A151, A2jə +++ Anjns Bm(x) = Bij (1) Boj, (x2) .. Brin (ta) and J = 
Ji J2...JIn. In this way we clearly see that the vector of unknown coefficients 
(of length J) can be found by solving a least squares (or LAD) problem (A.2), 
once we write down the components of the matrix B, namely Bgm = Bm(xx). 

In addition, we can add restrictions on the coefficients, and obtain a con- 
strained LS or LAD problem (A.8). So in principle one can apply the same 
method of solution as in the univariate case. The problem with the tensor 
product approach is the sheer number of basis functions and coefficients. For 
example, if one uses tensor product splines (i.e., Bj; are univariate B-splines), 
say Jı = Jo =... Jn = 5 and works with the data in R5, there are 5° = 3125 
unknown coefficients. So the size of the matrix B will be K x 3125. Typ- 
ically one needs a large number of data K > J, otherwise the system is 
ill-conditioned. Furthermore, depending on the choice of Bij, these data need 
to be appropriately distributed over the domain (otherwise we may get entire 
zero columns of B). The problem quickly becomes worse in higher dimensions 
—a manifestation of so-called curse of dimensionality. 
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Thus tensor product schemata, like tensor splines, are only applicable in a 
small dimension when plenty of data is available. For tensor product regression 
splines, the data may be scattered, but for interpolating and smoothing splines 
it should be given on a rectangular mesh, which is even worse. Hence they 
have practical applicability only in two-dimensional case. 

Inclusion of monotonicity and convexity constraints for tensor product 
regression splines is possible. In a suitably chosen basis (like B-splines or T- 
splines in fia). monotonicity (with respect to each variable) is written as a 
set of linear inequalities. Then the LS or LAD approximation becomes the 
problem (A.8), which is solved by either quadratic or linear programming 
techniques (see Section [A.6). These methods handle well degeneracy of the 
matrices (e.g., when K < J), but one should be aware of their limitations, 
and general applicability of tensor product schemata to small dimension. 


RBF and Neural Networks 


These are two popular methods of multivariate nonlinear approximation. In 
the case of the Radial Basis Functions (RBF) [39], one uses the model 


K 
F(x) = Y axg (Ix — xull), 
k=1 


i.e., the same number of basis functions g = g(||-—xx||) as the data. They are 
all translations of a single function g, which depends on the radial distance 
from each datum x;. Popular choices are thin plate splines, gaussian and 
multiquadratics. 

The function g is chosen so that it decreases with the argument, so that 
data further away from x have little influence on the value of f(x). The 
coefficients are found by solving a least squares problem similar to (A.2). The 
matrix of this system is not sparse, and the system is large (for large K). 
Special solution methods based on far-field expansion have been developed 
to deal with the computational cost associated with solving such systems of 
equations. Details are given in BA. 

We are unaware of any special methods which allow one to preserve mono- 
tonicity of f when using RBF approximation. 

The Artificial Neural Networks (ANN) is a very popular method of ap- 
proximation, which has a nice parallel with functioning of neurons, see e.g., 
173}. Here a typical approximation model (for a two-layer system) is 


f(x) =h X wih X ujzj + Uo +wo l, 
i=1 


j=1 


where h is the transfer function (a sigmoid-type function like tan~'), m is the 
number of hidden neurons, and w;, uj are the weights to be determined from 
the data. Sometimes x; is replaced with some value g(x;). 
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Training of an ANN consists in identifying the unknown weights wi, uj 
from the data, using essentially the same least squares criterion (although 
other fitting criteria are often used). Because of nonlinearity in h, this is no 
longer a quadratic (or even convex) programming problem. While for the out- 
put layer weights w;, this is less problematic B for the hidden layer weights uj 
it is a non-convex multiextrema optimization problem, which requires global 
optimization methods. In practice training is done by a crude gradient de- 
scent method (called back-propagation), possibly using multistart approach. 
But even a suboptimal set of weights delivers an adequate precision, and it is 
argued that more accurate weight determination leads to “overfitting”, when 
the ANN predicts well the training data, but not other values of f. 

Among other multivariate approximation scheme we mention the k-nearest 
neighbors approximation, Sibson’s natural neighbor approximation and splines 
built on triangulations. These methods are described elsewhere (4) 12 p25. 
To our knowledge, only a few methods (mostly based on splines) preserve 
monotonicity of the data. 


Lipschitz approximation 


This is a new scattered data interpolation and approximation technique based 
on the Lipschitz properties of the function f bil ba. Since Lipschitz condition 
is expressed as 

If(x) — fY) < M||x— yll, 


with ||-|| being some norm, it translates into the tight lower and upper bounds 
on any Lipschitz function with Lipschitz constant M that can interpolate the 
data, 


u(x) = min {ye + M||x — xzll}, 


a(x) = „max {yk — M||x~ — x||}. 


Then the best possible interpolant in the worst case scenario is given by the 
arithmetic mean of these bounds. Calculation of the interpolant is straightfor- 
ward, and no solution to any system of equations (or any training) is required. 

The method also works with monotonicity constraints, by using the bounds 


ou(x) = minty + M||(x = xx)+|[}, 
a(x) = max{yk — M||(xx = x)+ll} (A.9) 
where z+ denotes the positive part of vector z: Z} = (Z1, ..., Zn), with 


2 

K m n 

3 One method is to minimize X | Y wih | Y ures + w) + wo — =n) P 

k=1 \i=1 j=l 
the problem with respect to w; is a QP problem. 
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Zi = max{z;, 0}. 


In fact many other types of constraints can be included as simple bounds on 
f, see Chapter [6] 

What is interesting about Lipschitz interpolant, is that it provides the best 
possible solution in the worst case scenario, i.e., it delivers a function which 
minimizes the largest distance from any Lipschitz function that interpolates 
the data. 

If one is interested in smoothing, then the method of Lipschitz smoothing 
is applied (21). It consists in determining the smoothened values of y, that 
are compatible with a chosen Lipschitz constant. This problem is set as either 
a QP or LP problem, depending whether we use the LS or LAD criterion. 

This method has been generalized for locally Lipschitz functions, where 
the Lipschitz constant depends on the values of x, and it works for monotone 
functions. 


A.5 Convex and non-convex optimization 


When fitting a function to the data, or determining the vector of weights, one 
has to solve an optimization problem. We have seen that methods of univari- 
ate and multivariate approximation require solving such problems, notably 
the quadratic and linear programming problems. In other cases, like ANN 
training, the optimization problem is nonlinear. There are several types of 
optimization problems that frequently arise, and below we outline some of 
the methods developed for each type. We consider continuous optimization 
problems, where the domain of the objective function f is R” or a compact 
subset. 

We distinguish unconstrained and constrained optimization. In the first 
case the domain is R”, in the second case the feasible domain is some subset 
X C R”, typically determined by a system of equalities and inequalities. We 
write 


minimize f(x) (A.10) 


3 


A special case arises when the functions g;, h; are linear (or affine). The 
feasible domain is then a convex polytope, and when in addition to this the 
objective function is linear or convex quadratic, then special methods of linear 
and quadratic programming are applied (see Section [A.5.2). They work by 
exploring the boundary of the feasible domain, where the optimum is located. 

We also distinguish convex and non-convex optimization. A convex func- 
tion f satisfies the following condition, 


flax + (1 -a)y) < af(x) + (1—a)f(y), 
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for all a € [0,1] and all x,y € Dom/(f). If the inequality is strict, it is called 
strictly convex. What is good about convex functions is that they have a 
unique minimum (possibly many minimizers), which is the global minimum 
of the function. Thus if one could check the necessary conditions for a mini- 
mum (called KKT (Karush-Kuhn-Tucker) conditions), then one can be certain 
that the global minimum has been found. Numerical minimization can be per- 
formed by any descent scheme, like a quasi-Newton method, steepest descent, 
coordinate descent, etc. BA [73 lol iod pod. 

If the function is not convex, it may still have a unique minimum, although 
the use of descent methods is more problematic. A special class is that of 
log-convex (or T-convex) functions, which are functions f, such that f= 
exp( f) (or f = T(f)) is convex. They are also treated by descent methods 
(for instance, one can just minimize f instead of f as the minimizers coincide). 

General non-convex functions can have multiple local minima, and fre- 
quently their number grows exponentially with the dimension n. This number 
can easily reach 102° — 106° for n < 30. While locating a local minimum can 
be done by using descent methods (called local search in this context), there is 
no guarantee whatsoever that the solution found is anywhere near the global 
minimum. With such a number of local minima, their enumeration is practi- 
cally infeasible. This is the problem of global optimization, treated in Sections 
[A.5.4]and[A.5.5] The bad news is that in general global optimization problem 
is unsolvable, even if the minimum is unique [44 

Whenever it is possible to take advantage of convexity or its variants, one 
should always do this, as more general methods will waste time by chasing 
non-existent local minima. On the other hand, one should be aware of the 
implications of non-convexity, especially the multiple local minima problem, 
and apply proper global optimization algorithms. 

We shall also mention the issue of non-differentiable (or non-smooth) op- 
timization. Most local search methods, like quasi-Newton, steepest descent, 
conjugate gradient, etc., assume the existence of the derivatives (and some- 
times all second order derivatives, the Hessian matrix). Not every objective 
function is differentiable, for example f(x) = |x|, or a maximum of differen- 
tiable functions. Calculation of descent direction at those points where f does 
not have a gradient is problematic. Generalizations of the notion of gradient 
(like Clarke’s subdifferential, or quasi-differential led, [zal are applied. What 
exacerbates the problem is that the local/global minimizers are often those 
points where f is not differentiable. There are a number of derivative-free 
methods of non-smooth optimization |9 B7 Roz, [206], and we particularly 
mention the Bundle methods (9, f 


4 Take as an example function f(x) = 1 for all x except x = a, where f(a) = 0. 
Unless we know what is a, there is no chance of finding it by exploring the 
feasible domain. Even relaxations of this problem allowing for continuity of f are 
still unsolvable. 
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A.5.1 Univariate optimization 


The classical method of univariate optimization is the Newton’s method. It 
works by choosing an initial approximation xo and then iterating the following 


step 
_ fe) 
f" (xr) 
Of course, the objective function f needs to be twice differentiable and the 
derivatives known explicitly. It is possible to approximate the derivatives using 
finite difference approximation, which leads to the secant method, and more 
generally to various quasi-Newton schemata. 

Newton’s method converges to a local minimum of f, and only if certain 
conditions are satisfied, typically if the initial approximation is close to the 
minimum (which is unknown in the first place). 

For non-differentiable functions the generalizations of the gradient (sub- 
gradient, quasi-gradient (60), zd) are often used. The minimization scheme 
is similar to the Newton’s method, but the derivative is replaced with an 
approximation of its generalized version. 

Golden section method (on an interval) is another classical method, which 
does not require approximation of the derivatives. It works for unimodal non- 
differentiable objective functions, not necessarily convex, by iterating the fol- 
lowing steps. Let [x1, x2] be the interval containing the minimum. At the first 





Tk+1 = Tk 


step take 73 = £1 +T(£2 — £1), £4 = LQ — T(£2 — z1), with T = 3-5 z 0.382, 
a quantity related to the golden ratio p as T = 1 — ll At each iteration the 
algorithm maintains four points £k, £1, &m, En, two of which are the ends of 
the interval containing the minimum. From these four points choose the point 
with the smallest value of f, say £k. Then remove the point furthest from £k, 
let it be £n. Arrange the remaining three points in the increasing order, let it 
be £m < £k < x, and determine the new point using x = £1 + £m — rz. The 
minimum is within [£m, xı]. The iterations stop when the size of the interval 
is smaller than the required accuracy 6: |a%,—xi| < ô. 

If the objective function is not convex, there could be multiple local min- 
ima, as illustrated on Fig. Locating the global minimum (on a bounded 
interval) should be done by using a global optimization method. Grid search 
is the simplest approach, but it is not the most efficient. It is often augmented 
with some local optimization method, like the Newton’s method (i.e., local 
optimization method is called from the nodes of the grid as starting points). 

If the objective function is known to be Lipschitz-continuous, and an 
estimate of its Lipschitz constant M is available, Pijavski-Shubert method 
is an efficient way to find and confirm the global optimum. It works 





5 The golden ratio is the positive number satisfying equation r? = r + 1. Consider 
the interval [0, 1]. Divide it into two parts using the point r. r is the golden section 


if Length of the whole segment _ Length of the larger segment 
ength of the larger segment Length of the smaller segment 
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as illustrated on Fig. a saw-tooth underestimate of the objective function 
f is built, using 
K . / 
H* (x) = , amin _ f(ae) — M|z — xx\, 

where K denotes the iteration. The global minimizer of H* is chosen as the 
next point to evaluate f, and the underestimate is updated after adding the 
value fx = f(aK) and incrementing K = K + 1. The sequence of global 
minima of the underestimates H*, K = 1,2,... is known to converge to the 
global minimum of f. 

Calculation of all local minimizers of H* (the teeth of the saw-tooth un- 
derestimate) is done explicitly, and they are arranged into a priority queue, 
so that the global minimizer is always at the top. Note that there is exactly 
one local minimizer of H* between each two neighboring points £m, &n. The 
computational complexity to calculate (and maintain the priority queue) the 
minimizers of H* is logarithmic in K. While this method is not as fast as 
the Newton’s method, it guarantees the globally optimal solution in the case 
of multiextremal objective functions, and does not suffer from lack of conver- 
gence. 





Fig. A.3. Optimization of a non-convex function with many local minima using 
Pijavski-Shubert method. 


A.5.2 Multivariate constrained optimization 
Linear programming 


As we mentioned, a constrained optimization problem involves constraints on 
the variables, that may be linear or nonlinear. If the constraints are linear, 
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and the objective function is also linear, then we have a special case called a 
linear programming problem (LP). It takes a typical form 


n 


minimize >” cizi = cx 
{=l 
s.t. Ax=b 
Cx <d 
Ti >0,i= ieee oe 


Here A,C are matrices of size k x n, m x n and b,d are vectors of size 
k and m respectively. Maximization problem is obtained by exchanging the 
signs of the coefficients c;, and similarly “greater than” type inequalities are 
transformed into “smaller than”. The condition of non-negativity of x; can in 
principle be dropped (such a; are called unrestricted) with the help of artificial 
variables, but it is stated as in the standard formulation of LP, and because 
most solution algorithms assume it by default. 

Each LP problem has an associated dual problem (see any textbook on 
linear programming, e.g., [594 [248}), and the solution to the dual problem 
allows one to recover that of the primal and vice versa. In some cases solution 
to the dual problem is computationally less expensive than that to the primal, 
typically when k and m are large. 

The two most used solution methods are the simplex method and the 
interior point method. In most practical problems, both types of algorithms 
have an equivalent running time, even though in the worst case scenario the 
simplex method is exponential and the interior point method is polynomial 
in complexity. 


Quadratic programming 


A typical quadratic programming problem is formulated as 


minimize $x'Qx + cx 
s.t. Ax=b 
Cx<d 
Ce 2 OG Sy cag IMs 


Here Q is a symmetric positive semidefinite matrix (hence the objective 
function is convex), A,C are matrices of constraints, c,b,d are vectors of 
size n, k, m respectively, and the factor 4 is written for standardization (note 
that most programming libraries assume it!). 

If Q is indefinite (meaning that the objective function is neither convex 
nor concave) the optimization problem is extremely complicated because of a 
very large number of local minima (an instance of an NP-hard problem). If Q 
is negative definite, this is the problem of concave programming, which is also 
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NP-hard. Standard QP algorithms do not treat these cases, but specialized 
methods are available. 

What should be noted with respect to both LP and QP is that the com- 
plexity is not in the objective function but in the constraints. Frequently the 
systems of constraints are very large, but they are also frequently sparse (i.e., 
contain many 0s). The inner workings of LP and QP algorithms use meth- 
ods of linear algebra to handle constraints, and special methods are available 
for sparse matrices. These methods avoid operations with Os, and perform 
all operations in sparse matrix format, i.e., when only non-zero elements are 
stored with their indices. It is advisable to identify sparse matrices and apply 
suitable methods. If the matrices of constraints are not sparse, then sparse 
matrix representation is counterproductive. 


General constrained nonlinear programming 


This is an optimization problem in which the objective function is not linear 
or quadratic, or constraints h;(x) < 0,7 = 1,...,m are nonlinear. There could 
be multiple minima, so that this is the problem of global optimization. If the 
objective function is convex, and the constraints define a convex feasible set, 
the minimum is unique. It should be noted that even a problem of finding a 
feasible x is already complicated. 

The two main approaches to constrained optimization are the penalty 
function and the barrier function approach [198]. In the first case, an auxiliary 
objective function f(x) = f(x) + AP(x) is minimized, where P is the penalty 
term, a function which is zero in the feasible domain and non-zero elsewhere, 
increasing with the degree to which the constraints are violated. It can be 
smooth or non-smooth (209). is a penalty parameter; it is often the case 
that a sequence of auxiliary objective functions is minimized, with decreasing 
values of A. Minimization of f is done by local search methods. 

In the case of a barrier function, typical auxiliary functions are 


F(x) = f(x) +A J -nhl F) = HAA Ohl) 


but now the penalty term is non-zero inside the feasible domain, and grows 
as x approaches the boundary. 

Recently Sequential Quadratic Programming methods (SQP) have gained 
popularity for solving constrained nonlinear programming problems, espe- 
cially those that arise in nonlinear approximation 217]. In essence, this 
method is based on solving a sequence of QP subproblems at each iteration 
of the nonlinear optimization problem, by linearizing constraints and approx- 
imating the Lagrangian function of the problem 





k+m 


k 
L(x, A) = f(x) +) Aigilx) + XO uh) 


i=m+1 
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quadratically (variables A; are called the Lagrange multipliers). We refer the 
reader to for its detailed analysis. 

Note that all mentioned methods converge to a locally optimal solution, if 
f or functions g; are non-convex. There could be many local optima, and to 
find the global minimum, global optimization methods are needed. 


A.5.3 Multilevel optimization 


It is often the case that with respect to some variables the optimization prob- 
lem is convex, linear or quadratic, and not with respect to the others. Of 
course one can treat it as a general NLP, but knowing that in most cases we 
will have a difficult global optimization problem, it makes sense to use the 
special structure for a subset of variables. This will reduce the complexity of 
the global optimization problem by reducing the number of variables. 

Suppose that we have to minimize f(x) and f is convex with respect to 
the variables x;,2 € T C {1,2,...,n}, and let Z = {1,...,n}\Z denote the 
complement of this set. Then we have 


min f(x) = min min f(x). 
x ziicĪ viriel 
This is a bi-level optimization problem. At the inner level we treat the variables 
xi, i € Ī as constants, and perform minimization with respect to those whose 
indices are in Z. This is done by some efficient local optimization algorithm. 
At the outer level we have the global optimization problem 


min. f(x), 
xi: icT 


where the function f is the solution to the inner problem. In other words, 
each time we need a value of f , we solve the inner problem with respect to 
Te EIT. 

Of course, the inner problem could be LP (in which case we apply LP 
methods), or a QP (we apply QP methods). And in principle it is possible to 
follow this separation strategy and have a multi-level programming problem, 
where at each level only the variable of a certain kind are treated. 

We note that the solution to the inner problem should be the global opti- 
mum, not a local minimum. This is the case when the inner problem is LP or 
QP and the appropriate algorithm is applied. 

Sometimes it is possible to have the situation where the function f is con- 
vex (say, quadratic positive definite) with respect to both subsets of compo- 
nents Z and Z individually, when the other subset of variables is kept constant. 
Then one can interchange the inner and outer problems 


min f(x) = re ain 8) = min parece 
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However, this does not mean that f is convex with respect to all variables 
simultaneously. In fact it may be that in each case the outer problem is a 
multiextrema global optimization problem. It is advisable to use the smallest 
subset of variables at the outer level. 


A.5.4 Global optimization: stochastic methods 


Global optimization methods are traditionally divided into two broad cate- 
gories: stochastic and deterministic (128) bod Rod. Stochastic methods do not 
guarantee the globally optimal solution but in probability (i.e., they converge 
to the global optimum with probability 1, as long as they are allowed to run 
indefinitively). Of course any algorithm has to be stopped at some time. It is 
argued that stochastic optimization methods return the global minimum in a 
finite number of steps with high probability. 

Unfortunately there are no rules for how long a method should run to 
deliver the global optimum with the desired probability, as it depends on 
the objective function. In some problems this time is not that big, but in 
others stochastic methods converge extremely slowly, and never find the global 
solution after any reasonable running time. This is a manifestation of the so— 
called curse of dimensionality, as the issue is aggravated when the number 
of variables is increased, in fact the complexity of the optimization problem 
grows exponentially with the dimension. Even if the class of the objective 
functions is limited to smooth or ‘icra functions, the global optimization 

204]. 


problem is NP-hard (128) (129), 
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Fig. A.4. An objective function with multiple local minima and stationary points. 
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The methods in this category include pure random search (i.e., just evalu- 
ate and compare the values of f at randomly chosen points), multistart local 
search, heuristics like simulated annealing, genetic algorithms, tabu search, 
and many others, see (188, Roz. The choice of the method depends very much 
on the specific problem, as some methods work faster for certain problem 
classes. All methods in this category are competitive with one another. There 
is no general rule for choosing any particular method, it comes down to trial 
and error. 


A.5.5 Global optimization: deterministic methods 


Deterministic methods guarantee a globally optimal solution for some classes 
of objective functions (e.g., Lipschitz functions), however their running time is 
very large. It also grows exponentially with the dimension, as the optimization 
problem is NP-hard. 

We mention in this category the grid search (i.e., systematic exploration 
of the domain, possible with the help of local optimization), Branch-and- 
Bound methods (especially the aBB method [o5)), space-filling curves (i.e., 
representing a multivariate function f through a special univariate function 
whose values coincide with those of f along an infinite curve which “fills” 
the domain, see ), and multivariate extensions of the Pijavski-Shubert 
method id, 18, ; 

One such extension is known as the Cutting Angle methods (219), and 
the algorithm for the Extended Cutting Angle Method (ECAM) is described 
in fil hd iia Bd. It mimics the Pijavski-Shubert method in Section [5.1] 
although calculation of the local minimizers of the saw-tooth underestimate 
H* is significantly more complicated (the number of such minimizers also 
grows exponentially with the dimension). However in up to 10 variables this 
method is quite efficient numerically. 

We wish to reiterate that there is no magic bullet in global optimization: 
the general optimization problem is unsolvable, and it is NP-hard in the best 
case (when restricting the class of the objective functions). It is therefore very 
important to identify some of the variables, with respect to which the objective 
function is linear, convex or unimodal, and set up a multilevel optimization 
problem, as this would reduce the number of global variables, and improve 
the computational complexity. 


A.6 Main tools and libraries 


There are a number of commercial and free open source programming libraries 
that provide efficient and thoroughly tested implementations of the approxi- 
mation and optimization algorithms discussed in this Appendix. Below is just 
a sample from an extensive collection of such tools, that in the authors’ view 
are both reliable and sufficiently simple to be used by less experienced users. 
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Valuable references are http: //plato.asu.edu/sub/nonlsq. html 
http: //www-fp.mcs.anl.gov/otc/Guide/SoftwareGuide 
http: //www2.informs.org/Resources/ 
An online textbook on optimization is available from 
http: //www.mpri.lsu.edu/textbook/TablCont.htm 


Linear programming 


A typical linear programming problem is formulated as 


n 


minimize > Gizi = cx 
i=l 
s.t. Ax=b 
Cx<d 
ir = ht = Lya; 


where A,C are matrices of size k x n, m x n and b,d are vectors of size k 
and m respectively. 

The two most used solution methods are the simplex method and the inte- 
rior point method (59), [249]. There are a number of standard implementations 
of both methods. Typically, the user of a programming library is required 
to specify the entries of the arrays c,A,b,C,d, point to the unrestricted 
variables, and sometimes specify the lower and upper bounds on a Most 
libraries use sparse matrix representation, but they also provide adequate 
conversion tools. 

The packages GLPK (GNU Linear Programming Toolkit) and LPSOLVE 
http://www. gnu. org/software/glpk/ 
http://tech.groups.yahoo.com/group/1p_solve/ are both open source 
and very efficient and reliable. Both implement sparse matrix representation. 

Commercial alternatives include CPLEX http://www.ilog.com, LINDO 
http://www.lindo.com, MINOS http://www.stanford.edu/~saunders/ 
brochure/brochure.html and many others. These packages also include 
quadratic and general nonlinear programming, as well as mixed integer pro- 
gramming modules. 


Quadratic programming 


A typical quadratic programming problem is formulated as 


minimize $x'Qx + cx 
s.t. Ax=b 
Cx<d 
ti Oe Shenasi 


ê Even though the bounds can be specified through the general constraints in C, d, 
in the algorithms the bounds are processed differently (and more efficiently). 
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Here Q is a symmetric positive semidefinite matrix (hence the objective 
function is convex), A,C are matrices of constraints, c,b,d are vectors of 
size n, k, m respectively. Note that most libraries assume the factor 4 ! 

An open source QP solver which supports sparse matrices is OOQP, 
http://www.cs.wisc.edu/~swright/ooqp/. It requires a separate module 
which should be downloaded from HSL Archive, module MA27, 
http://www.cse.clrc.ac.uk/nag/hsl/contents. shtml 

There are many alternative QP solvers, for example Algorithm 559 
http://www.netlib.org/toms/559, as well as already mentioned commer- 
cial CPLEX, LINDO and MINOS packages, see the guides to optimization 
software http://plato.asu.edu/sub/nonlsq. html 
http: //www-fp.mcs.anl.gov/otc/Guide/SoftwareGuide 


Least absolute deviation problem 


As we mentioned, the LAD problem is converted into an LP problem by using 
(A.4), hence any LP solver can be used. However there are specially designed 
versions of the simplex method suitable for LAD problem Bd. 

The LAD problem is formulated as follows. Solve the system of equations 
Ax & b in the least absolute deviation sense, subject to constraints Cx = d 
and Ex < f, where A, C, E and b, d, f are matrices and vectors defined by the 
user. The computer code can be found in netlib http: //www.netlib.org/ as 
Algorithm 552 http://www.netlib.org/toms/552, see also Algorithm 615, 
Algorithm 478 and Algorithm 551 in the same library. 

For Chebyschev approximation code see Algorithm 495 in netlib 
http: //www.netlib.org/toms/495. 

Note that these algorithms are implemented in FORTRAN. A translation 
into C can be done automatically by f2c utility, and is also available from the 
authors of this book. 

Also we should note that all the mentioned algorithms do not use sparse 
matrix representation, hence they work well with dense matrices of constraints 
or when the number of constraints is not large. For large sparse LADs, use 
the generic LP methods. 


Constrained least squares 


While general QP methods can be applied to this problem, specialized al- 
gorithms are available. The ALgorithm 587 from netlib solves the following 
problem called LSEI (Least Squares with Equality and Inequality constraints). 

Solve the system of equations Ax & b in the least squares sense, subject 
to constraints Cx = d and Ex < f, where A, C, E and b,d,f are matrices 
and vectors defined by the user. The algorithm handles well degeneracy in the 
systems of equations/constraints. 

The computer code (in FORTRAN) can be downloaded from netlib 
http: //www.netlib.org/toms/587, and its translation into C is available 
from the authors. 
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Nonlinear optimization 


As a general reference we recommend the following repositories: 
http: //www-fp.mcs.anl.gov/otc/Guide/SoftwareGuide 

http: //www2.informs.org/Resources/ 

http://gams .nist.gov/ 
http://www.mat.univie.ac.at/~neum/glopt/software1.html 


Univariate global optimization 


For convex problems the golden section methods is very reliable, it is often 
combined with the Newton’s method. There are multiple implementations, 
see the references to nonlinear optimization above. 

For non-convex multiextremal objective functions, we recommend Pijavski- 
Shubert method, it is implemented in GANSO library as the special case of 
ECAM http://www. ganso.com.au 


Multivariate global optimization 


There are a number of repositories and links at 
http: //www.mat.univie.ac.at/~neum/glopt/software_g.html 
http://gams .nist.gov/ 

GANSO library http://www. ganso.com.auimplements a number of global 
methods, both deterministic (ECAM) and stochastic (multistart random 
search, heuristics) and also their combinations. It has C/C++, Fortran, Mat- 
lab and Maple interfaces. 


Spline approximation 


Various implementations of interpolating, smoothing and regression splines 
(univariate and bivariate) are available from Netlib and TOMS 
http://www.netlib.org/tom. Monotone univariate and tensor product re- 
gression splines are implemented in tspline package 
http: //www.deakin.edu.au/~gleb/tspline.html 

Another implementation is FITPACK http: //www.netlib.org/fitpack/ 


Multivariate monotone approximation 


tspline package http://www.deakin.edu.au/~gleb/tspline.html imple- 
ments monotone tensor-product regression splines. 

The method of monotone Lipschitz approximation is available from LibLip 
library http://www.deakin.edu.au/~gleb/lip.html, and also 
http: //packages.debian.org/stable/libs/liblip2 
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Problems 


Problems for Chapter 


Problem B.1. Prove the statement in Note [L.30]on p[13] (Hint: compare to 
Note [E.25). 


Problem B.2. Give other examples of continuous but not Lipschitz contin- 
uous aggregation functions, as in Example [I.64]on p23] 


Problem B.3. Write down the dual product in Example[1.77] p. Z7 explicitly 
for two and three arguments. 


Problem B.4. Write down the Einstein sum in Example[I.79] p.[27] for three 
and four arguments (note that this is an associative function). 


— _4e} 
~~ (1+z1)? 





Problem B.5. Consider Example[L.88]on p.B0] Show that g(a, 1) 
zı for xı € [0,1]. 


< 


Problem B.6. Suppose you have the input vector x = (0.1,0.1,0.5,0.8). Cal- 
culate: 


1. The arithmetic, geometric and harmonic means; 

. Median and a-Median with a = 0.5; 

. OWA with the weighting vector w = (0.5, 0.2, 0, 0.3); 

. Weighted arithmetic mean Mw with the same weighting vector; 

. Product Tp, dual product Sp, Lukasiewicz t-norm Tr and t-conorm S7; 
. The 3 — JT function. 


Problem B.7. Show that the function 


f(x,y) = i 


oF Why 


aD 


zr? +y? 





if x +y #0 and f(0,0) = 0, 





£+ 


is not an aggregation function. (Hint: you need to check the two main prop- 
erties of aggregation, and to check monotonicity use the restriction of this 
function to the edges of the unit square. You may graph the resulting four 
functions, or compare their values at certain points). 
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Problem B.8. Express formally the continuity condition and the Lipschitz 
condition of any aggregation function. Could you give examples for both cases? 


Problem B.9. Show that the following function is a strong negation on (0, 1] 


N(t) = F$ for à > -1. 


Problems for Chapter [2] 


Problem B.10. Let M, G, H and Q be the arithmetic, geometric, harmonic 
and quadratic means respectively. Also for an aggregation function F let 
F(x) = 6‘ (F(¢(21),..-,¢(@n))). Prove the following statements: 


1. If ¢ : [0,1] — [0, 1] is given by ¢(x) = x? then My = Q and Gy = G; 
2. If ¢ : [0,1] — [0,1] is given by d(x) = e"= then G}; =H 
3. If ọ : [0,1] — [0, 1] is given by ọ(x) = 1 — z then Mẹ = M. 
Problem B.11. Show that A(z1,..., £n) = P ate; is a bisymmetric ag- 


gregation function which is neither syiminettic a associative. 


k 
Problem B.12. Show that u,(A) = (4) for some k > 1 is a symmetric 
balanced fuzzy measure, and determine the corresponding OWA function. 


Problem B.13. Given the weighting vector w = (0.1,0.2,0.1,0,0.6), calcu- 
late the orness measure of the OWA function and of the weighted arithmetic 
mean. Then calculate the OWA function OW Ay and the weighted mean My 
of the input vector x = (0.4, 0.1, 0.2, 0.6, 0.9). 


Problem B.14. Given a fuzzy quantifier Q(t) = t?, calculate the weights of 
the OWA function of dimension n = 5. Then calculate its orness measure and 
the value of this OWA function at x = (0.4, 0.1, 0.2, 0.6, 0.9). 


Problem B.15. Given the fuzzy measure 


o({i}) =0.1, i=1,...,4; 
o({1, 2}) = o({1, 2, 3}) = o({1, 2, 4}) = 0.5; 
v({1, 3}) = o({1, 4}) = v({2, 3}) = v2, 4}) = 0.4; 
u({3, 4}) = v({2,3, 4}) = v({1, 3, 4}) = v({1, 2,3, 4}) = 1, 


calculate the Choquet integral of the input vector x = (0.4,0.1, 0.2, 0.5). 


Problem B.16. Given the \-fuzzy measure on 2N ,N = {1,2,3}, determined 
by 

v({1}) = 0.1, v({2}) = 0.1, v({3}) = 0.4, 
calculate the value of À (you may need to solve equation (2.71) numerically), 
and the rest of the coefficients of this fuzzy measure. Then calculate the Cho- 
quet integral of the input vector x = (0.4, 0.1, 0.2). 
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Problem B.17. Let M = {1,2,3,4}. Determine which of the following set 
functions are fuzzy measures, and explain why 
1. (A) = >> i for all ACN; 
iC A 
_ f1, if AFD, 
aah ‘0 if A= 0; 


3. (A) = Ð é for all ACN; 
ic A 
1, if A=N, 
4.v(A)=40, if A=9, 
1/3 otherwise; 
5. v(A) = X i- (HE) for al ACN. 
iC A 
Problem B.18. Check for each of the following set functions whether they 
are \-fuzzy measures. When they are, determine A. 


1. N = {1,2} and v({1}) = 1/2, v({2}) = 3/4, v(0) = 0, oN) = 1; 
2N = CaS (0) =0,v(N) = 1; 
if A= i 
3. Let N = {1,2,3} and v( A if A=, forall ACN; 
i otherwise 
4. Let N = {1,2,3} and v( A r a etre for all ACN. 


Problem B.19. 


1. Let vı and v2 be fuzzy measures on M. Show that the set function v defined 
as v( A) = $(v1(A) + v2(A)) for all A C N is also a fuzzy measure. 
2. Prove that the set of all fuzzy measures is convex | 


Problem B.20. Show that the set of all self-dual measures is convex. Prove 
the same statement for the set of all additive measures. 


Problem B.21. Prove that the aggregation (by means of any aggregation 
function) of fuzzy measures is in turn a fuzzy measure, that is, prove the 
following statement: 

Let f : [0,1] — [0,1] be an aggregation function and let v1,...,Um : 
2M — [0,1] be fuzzy measures. Then the set function v = f(v1,...,Um) : 
N — [0,1], defined for any A CN as 


v(A) = f(vi(A), one ,Um(A)) 
is a fuzzy measure. 


1 We remind that a set E is convex if ax + (1—a)y € E for all x,y € E,a € [0,1]. 
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Problem B.22. Let v be {0,1}-fuzzy measure on M = {1,2,3} given by 
v(0) =0, oN) = 1, v({1}) = v({2}) = v({3}) =0, 
v({1,2}) = v({1,3}) = 0, v({2,3}) = 1. 


Determine its dual fuzzy measure v*. After that, show that v is superad- 
ditive. Determine whether v* is subadditive or not. 


Problem B.23. Are the following statements true or false: 


1. If a measure is balanced then its dual measure is also balanced. 
2. If a measure is symmetric then its dual measure is also symmetric. 


Problem B.24. Let v be a fuzzy measure and A, 6 be subsets in its domain. 
Show that: 


1. v( AN B) < min(v(A), v(B)); 

2. v( AUB) > max(v(A), v(B)). 
Problem B.25. Show that: 

1. Pos(A) = max(Pos({i})) for all ACN; 

2. Nec(A) = min(Nec(N \ {i})) for all ACN. 
Problem B.26. Let (Pos{1}, Pos{2}, Pos{3}, Pos{4}, Pos{5}) = 
(1,0.5,0.3,0.2,0.6) be a possibility measure on M = {1, 2,3,4,5}. 


1. Determine the value of the possibility measure for each subset of M. 
2. Determine the dual of the possibility measure obtained in part 1. 


Problem B.27. Prove the two following statements (assuming |V| > 2): 


1. Let f be a weighted mean. If Prob,,..., Prob; are probability measures, 
then v = f(Prob;,...,Prob,,) is a probability measure. 


2. Let f be given by f (£1,..., £m) = max (fi(xi)), where fi : [0,1] — [0,1] 


(ea Beer 4 
are non-decreasing functions satisfying f;(0) = 0 for all ¿i € {1,...,m} and 
fi(1) = 1 for at least one i. If Pos,,..., Pos are possibility measures, 
then v = f(Pos,,..., Pos) is a possibility measure. 


Problem B.28. Let v be an additive fuzzy measure. Show that the measure of 
a set A is determined by the measure of its singletons, i.e., v( A) = >> v({a}). 
acA 
Problem B.29. Determine discrete Choquet and Sugeno integrals of the vec- 
tor x with respect to a fuzzy measure v, where v and x are given as follows: 


Problems 333 


1. N = {1,2} and 
0.5, if A= {1} 
_ jo, fA=0 
AS) 99 A= 
1, ifA=N 
and 


i) x = (0.8,0.4); ii) x = (0.8,0.9). 


2.N = {1,2,3,4}, v a A-measure with v({1}) = 1/15, v({2}) = 1/4, 
v({3}) = 1/5,A = 1 and 


i) x = (2/3,1/5,1/2,1); ii) x = (1/2,1/3,1/4,1/5). 


Problem B.30. Determine the Choquet integral of x = (a1,...,@4) w.r.t 
the additive fuzzy measure determined by v = (0.25, 0.35, 0.15, 0.25) on NV = 
{1,2,3,4}, where each component of v corresponds with the measure on the 
singleton {i} for i = 1,...,4. Which special kind of the Choquet integral-based 
aggregation function do you obtain? 


Problems for Chapter [3] 


Problem B.31. Prove Proposition [3.20ļon p. [132 


Problem B.32. In the discussion of properties of triangular norms on p. 
[130] we stated that a pointwise minimum or maximum of two t-norms is a) 
not generally a t-norm, b) it is a conjunctive aggregation function. Prove this 
statement. You can use a counterexample in part a). Part b) must be a general 
proof. 


Problem B.33. Refer to properties on p. [30] Prove (by example) that not 
all t-norms are comparable. 


Problem B.34. Refer to properties on p. [130] Prove that a linear combina- 
tion of t-norms aT} (x)+ bT2(x), a,b € R, is not generally a t-norm (provide a 
counterexample), although it is a conjunctive extended aggregation function 
if a,b € [0,1],b= 1 — a. 


Problem B.35. Prove that minimum is a continuous t-norm which is neither 
strict nor nilpotent (see Section B.4.3]and Example B.26). 


Problem B.36. Determine multiplicative generators of Schweizer-Sklar, 
Hamacher, Frank and Yager t-norms and t-conorms, starting from their ad- 
ditive generators. 


Problem B.37. Suppose you have the input vector x = (0.2,0.1,0.5,0.9). 
Calculate: 
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1. The Yager, Dombi, Hamacher and Frank t-norms with parameter À = 2; 
2. The Yager, Dombi, Hamacher and Frank t-conorms with parameter à = 2. 


Problem B.38. Given the following functions H; : [0,1]? — [0,1] for i = 


1,...,4 
_f0, if (ey) € [0,05] x [0,1], 
Í: Hı (zx, y) ~~ coer otherwise, 
_f05, if (x,y) €]0,1P, 
2: Hə(x, y) = bo otherwise, 
3. H(z, y) = xy max(z, y), 
4. H(z, y) = 0, 


determine which ones are a) semicopulas, b) quasi-copulas and c) t-norms. 


Problem B.39. Prove the following statements: 


1. Tp(x, y) < Tr (x,y), for all z, y € [0,1]; 
2. T(x, y) < Tp(z,y), for all x,y € [0,1]; 
3. Tp(x,y) < min(z, y), for all x,y € [0,1]; 
4. T(x, y) < min(z, y), for all x,y € [0,1] and for any t-norm T; 
5. Tp(x,y) < T(x, y), for all x,y € [0,1] and for any t-norm T. 


Problem B.40. Formulate and prove the statements corresponding to those 
in Problem for a t-conorm S$. 


Problem B.41. Show that: 


1. Sz; and Ty are dual; 
2. Sp and Tp are dual. 


Problem B.42. Let T and S be dual t-norms and t-conorms with respect to 
a strong negation N. Show that the following laws (De Morgan Laws) must 
hold: 


1. N(T(x,y)) = S(N(ax), N(y)) for all x, y € [0,1]; 
2. N(S(x,y)) = T(N(x), N(y)) for all x,y € [0,1]. 


Problem B.43. Prove that min (max) is the only idempotent t-norm (t- 
conorm). 


Problem B.44. Prove the following statements: 


1. A t-conorm S' is distributive P| over a t-norm T if and only if T = min. 
2. A t-norm T is distributive over a t-conorm S if and only if S = max. 


? A bivariate function f is distributive over g if f(a,g(y,z)) = g( f(x,y), f(a, z)) 
and f(g(x,y), z) = g(f(x,z), f(y, z)). Note that the second condition is redundant 
if f is symmetric. 
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Problem B.45. Let H be a t-norm or a t-conorm. Determine whether 
H(x, N(x)) = 0 and/or H(z, N(x)) = 1 for all x € [0,1] are valid for the 
following cases: 

1. H=Tp or H = Sp; 

2.H=T, or H = Sr; 

3: ST ip or H = Sp. 


Problem B.46. Show that the product t-norm has neither non-trivial idem- 
potent elements, nor zero divisors nor nilpotent elements. 


Problem B.47. Show that no element of ]0,1[ can be both an idempotent 
and nilpotent element of a t-norm. 


Problem B.48. Show that the set of idempotent elements of the nilpotent 
minimum t-norm is {0}(J0.5, 1], the set of nilpotent elements is ]0.5, 1] and 
the set of zero divisors is ]0, 1[. 


Problem B.49. Show that the following t-norm is non—continuous 


P if (x,y) € [0,1P, 
min(x, y), otherwise. 


(ey) = { 


Problem B.50. Justify by means of an example that: 


1. An Archimedean t-norm needs not be strictly monotone on ]0, 1]?; 
2. A strictly monotone (on ]0, 1]?) t-norm needs not be a strict t-norm. 


Problem B.51. Show that if T is a strict t-norm then it has neither non- 
trivial idempotent elements nor zero divisors. 


Problem B.52. Determine a family of t-norms isomorphic to the product 
t-norm and another family isomorphic to the Lukasiewicz t-norm. 


Problem B.53. Determine the ordinal sum t-norm with summands ((0,1/ 
3,71), (2/3, 1, T2)) and T1, To given by Tı (x, y) = max( 4" 0), To(x, y) = 
a a Moreover, find the corresponding dual t-conorm. 


Problem B.54. Determine the t-norms whose additive generators are: 
0, ing =, 
nt) ={ 


— log $ otherwise, 


and 
0, ift=1, 
g2(t)= 4 3—t if a €]1/2,1f, 
5 — 2t otherwise. 


Problem B.55. Prove the following statements: 
1. A t-norm is a copula if and only if it satisfies the Lipschitz property. 
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2. Every associative quasi—copula is a copula, and hence it is a t-norm. 


Problem B.56. Show that the following functions are copulas if C' is a cop- 
ula: 





1 Oi(e,y) =e = Cz, 1 =y); 
2 Co(xz,y) = y — C(1 — x,y); 
3. C3(z,y) =£+y-1+C(1-z7z,1-— y). 


Problem B.57. Show that the following function is a copula 
Cx (x,y) = (min(z, y))> : (Tp(z,y)) ^, A€ [0,1]. 


Problems for Chapter [4] 


Problem B.58. Prove that uninorms have averaging behavior in the region 
(0, e] x [e, 1] U [e, 1] x [0, e] (see p. BOI). 


Problem B.59. Prove that for any uninorm U it is U(0,1) € {0,1}. 


Problem B.60. Write down the expression of the weakest and the strongest 
uninorms (Example on page[206) for inputs of dimension n. 


Problem B.61. Prove that given a representable uninorm U with additive 
generator u and neutral element e €]0,1[, the function N,,(x) = u~1(—u(2)) 
is a strong negation with fixed point e and U is self-dual with respect to Nu 
(excluding the points (0,1) and (1,0)). 


Problem B.62. Check the expression of the MYCIN’s uninorm (Example 
[4.4) rescaled to the domain [0,1], and prove that Tp and Sp are, respectively, 
its underlying t-norm and t-conorm. 


Problem B.63. Check that the PROSPECTOR’s combining function (see 
Example [4.5), when rescaled to [0,1], coincides with the 3-7 function given 
in Example [4.19] 


Problem B.64. fod) Given \ > 0 and the function wu, : [0,1] > [—o0, +00] 
defined by 


a= (- Joalis 0) 


determine the representable uninorm whose additive generator is wu, and cal- 
culate its neutral element, underlying t-norm and underlying t-conorm. 


Problem B.65. Prove that any binary nullnorm V : [0,1]? — [0,1] verifies 


Vt € [0, al, V(t,1) =a 
Vt € [a, 1], V(t,0) =a 
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Problem B.66. Using the results of the previous problem, prove the follow- 
ing statement (see page[215), valid for any nullnorm: 


V(x, y) € [0, a] x [a, 1] U fa, 1] x [0, al, V(a,y) =a 


Problem B.67. Check that idempotent nullnorms (page PTY) are self-dual 
with respect to any strong negation N with fixed point a. 


Problem B.68. Write down the expression of the Lukasiewicz nullnorm (ex- 
ample [4.28] on page 218) for inputs of dimension n. 


Problem B.69. Prove that, as stated in Proposition |4.31} generated func- 
tions, as given in Definition [4.29] are aggregation functions. 


Problem B.70. Check that the generating systems (g;, h) and (c gi, ho 44), 
c € R, c > 0, generate the same function (see Note [Z.32). 


Problem B.71. Let ọ : [0,1] — [—oo, +00] be a continuous strictly increasing 
function and let w be a weighting vector. As observed in Section [4.1] the 
generating system given by g;(t) = w;-o(t) and h(t) = y~1(t) leads to a quasi- 
linear mean. Check that the generating system given by g;(t) = w;(ay(t) +b) 
and h(t) = y~!(£2), where a,b € R,a Æ 0, generates exactly the same 


a 
quasi-linear mean. 


Problem B.72. Prove that a generated function is a weighted mean if and 
only if it can be generated by a system (gi, h) such that g;(t) = a; - t + bi and 
h(t) =a-t+b, with a,aj;,b,b; E€ R, a,a; > 0 (See Note [4.33). 


Problem B.73. Check (Note[4.35) that quasi-arithmetic means are the only 
generated functions that are both idempotent and symmetric. 


Problem B.74. Use construction on p.225]in order to build an aggregation 
function with neutral element e €]0,1[ that behaves as the Lukasiewicz t- 
norm Ty, on [0, e]? and as the probabilistic sum Sp on [e, 1]?. Is the resulting 
function associative? Is it continuous? 


Problem B.75. Check that, as stated in Note [£52] on page [232] the T-S 
function generated by h(t) = a- g(t) +b, a,b € R, a # 0 coincides with the 
T-S function generated by g. 


Problem B.76. Check the statement on Section that T-S functions 
with generating function verifying g(0) = +00 (respectively g(1) = +00) have 
absorbing element a = 0 (respectively a = 1). 








Problem B.77. Prove (see Section that the dual of a T-S function 
Q,,7,S,g With respect to an arbitrary strong negation N is also a T-S function 
given by Qi-y,S4,Ta,ga) Where Sa is the t-norm dual to S w.r.t. N, Tq is the 
t-conorm dual to T w.r.t. N and ga=goN. 
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Problem B.78. Show that, as stated in Note [4.55]on page 234] the function 
L1/2,Tp,Sp iS idempotent for inputs of dimension n = 2 but not for inputs of 
dimension n = 3. 


Problem B.79. Section[£.5.2)shows the existence of binary idempotent linear 
convex T-S function built by means of a non-idempotent t-norm and a non- 
idempotent t-conorm. Prove that this result can be generalized to the class 
of binary T-S functions by choosing a pair (T, S) verifying the functional 
equation (1 — y) - 9(T'(a,y)) + 7-9(S(z,y)) = (1 = 7) - g(x) +7- g(y). What 
kind of averaging functions is obtained? 


Problem B.80. Adapt the functional equation given in Problem|[B.79]to the 
case of exponential convex T-S functions with parameter y = 1/2, and check 
that the pair (TË, Sp), made of the Hamacher product and the probabilis- 
tic sum (see Chapter B), verifies this equation (i.e., check that the function 
Ey /2,7#,5p İS idempotent when n = 2). What happens when n = 3? 


Problem B.81. Prove that Q,,7,5,n with N(t) = 1 — t (Example [4.57] on 
page [235) is a linear convex T-S function, and that it is the only T-S function 
generated by a strong negation which belongs to this class of T-S functions. 


Problem B.82. Check that T-S functions generated by g(t) = log(N(t)), 
where N is a strong negation (see Example [4.58]on page 235) are never expo- 
nential convex T-S functions. 


Problem B.83. Check that, as stated in the introduction of Section [4.6] the 
classes of conjunctive and disjunctive functions are dual to each other and 
that the N-dual of an averaging (respectively mixed) function is in turn an 
averaging (respectively mixed) function. 


Problem B.84. Prove the characterizations of N-symmetric sums given in 


Propositions [4.63] and [4.67] 


Problem B.85. Check that, as stated in Note [4.72| when using Corollary 
[4.68] to construct symmetric sums, any aggregation function g generates the 
same symmetric sum as its dual function gg. Find an example proving that 
this does not happen in the case of Corollary 4.65} 


Problem B.86. Check the statements in Section[4.6.2}regarding N-symmetric 
sums with absorbing/neutral element, and use them to construct some func- 
tions of these types. 


Problem B.87. Prove that shift-invariant symmetric sums can be built by 
means of Corollary [4.68]starting from an arbitrary shift-invariant generating 
function g. 


Problem B.88. Prove the statements in Section [£.7.2] regarding the posses- 
sion of absorbing and neutral elements of T-OWAs, S-OWAs and ST-OWAs. 
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Problem B.89. Check (see duality item in Section £7.2) that T-OWAs and 
S-OWAs are dual to each other with respect to the standard negation and that 
the attitudinal character of a T-OWA and its dual S-OWA are complementary. 


Problem B.90. Prove (see duality item in Section [4.7.2) that the class of 
ST-OWAs is closed under duality. 


Problem B.91. Prove the following two statements regarding T-OWAs and 
S-OWAs: 


e A bivariate T-OWA function Or, w is idempotent if and only if T = min 
or w = (1,0). 

e A bivariate S-OWA function Og.w is idempotent if and only if S = max 
or w = (0,1). 
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